29 EDA з dlookR
https://rpubs.com/linggaajiandika/EDA
29.1 00. Introduction
Exploratory Data Analysis (EDA) is the first step in data analysis process developed by “John Tukey” in the 1970s. In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.
note: some processed data is skipped or ignored like cbind or selecting variable because it’s not the focus of this tutorial
29.2 01. Load Package
29.3 02. Load Dataset
he dataste used is airquality which is already available in R, the airquality dataset is daily air quality measurements in New York, May to September 1973.
29.4 03. Data Diagnosis
The first step is diagnosis from simple data
29.4.1 3.1 General Data Diagnosis
variables | types | missing_count | missing_percent | unique_count | unique_rate |
---|---|---|---|---|---|
<chr> | <chr> | <int> | <dbl> | <int> | <dbl> |
Ozone | integer | 37 | 24.183007 | 68 | 0.44444444 |
Solar.R | integer | 7 | 4.575163 | 118 | 0.77124183 |
Wind | numeric | 0 | 0.000000 | 31 | 0.20261438 |
Temp | integer | 0 | 0.000000 | 40 | 0.26143791 |
Month | integer | 0 | 0.000000 | 5 | 0.03267974 |
Day | integer | 0 | 0.000000 | 31 | 0.20261438 |
variables
: variable namestypes
: the data type of the variablesmissing_count
: number of missing valuesmissing_percent
: percentage of missing valuesunique_count
: number of unique valuesunique_rate
: rate of unique value. unique_count / number of observation
29.4.2 3.2 Diagnose Numeric Variable
Only Numeric Variable
variables | min | Q1 | mean | median | Q3 | max | zero | minus | outlier |
---|---|---|---|---|---|---|---|---|---|
<chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | <int> | <int> |
Ozone | 1.0 | 18.00 | 42.129310 | 31.5 | 63.25 | 168.0 | 0 | 0 | 2 |
Solar.R | 7.0 | 115.75 | 185.931507 | 205.0 | 258.75 | 334.0 | 0 | 0 | 0 |
Wind | 1.7 | 7.40 | 9.957516 | 9.7 | 11.50 | 20.7 | 0 | 0 | 3 |
Temp | 56.0 | 72.00 | 77.882353 | 79.0 | 85.00 | 97.0 | 0 | 0 | 0 |
Month | 5.0 | 6.00 | 6.993464 | 7.0 | 8.00 | 9.0 | 0 | 0 | 0 |
Day | 1.0 | 8.00 | 15.803922 | 16.0 | 23.00 | 31.0 | 0 | 0 | 0 |