4 MS Excel (xlsx)
You need this packages for code execution:
4.1 XLSX-format
There are many packages to read/write MS Excel files. xlsx
one of the most useful.
?read.xlsx
- review package functions and params
Let’s read the data telecom_users.xlsx
:
data <- read.xlsx("../../data/telecom_users.xlsx", sheetIndex = 1)
# sheetIndex = 1 - select sheet to read, or use sheetName = "sheet1" to read by Name
head(data)
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | ... | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<chr> | <chr> | <dbl> | <chr> | <chr> | <dbl> | <chr> | <chr> | <chr> | <chr> | ... | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <dbl> | <dbl> | <chr> | |
1 | 7010-BRBUU | Male | 0 | Yes | Yes | 72 | Yes | Yes | No | No internet service | ... | No internet service | No internet service | No internet service | No internet service | Two year | No | Credit card (automatic) | 24.10 | 1734.65 | No |
2 | 9688-YGXVR | Female | 0 | No | No | 44 | Yes | No | Fiber optic | No | ... | Yes | No | Yes | No | Month-to-month | Yes | Credit card (automatic) | 88.15 | 3973.20 | No |
3 | 9286-DOJGF | Female | 1 | Yes | No | 38 | Yes | Yes | Fiber optic | No | ... | No | No | No | No | Month-to-month | Yes | Bank transfer (automatic) | 74.95 | 2869.85 | Yes |
4 | 6994-KERXL | Male | 0 | No | No | 4 | Yes | No | DSL | No | ... | No | No | No | Yes | Month-to-month | Yes | Electronic check | 55.90 | 238.50 | No |
5 | 2181-UAESM | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | ... | Yes | No | No | No | Month-to-month | No | Electronic check | 53.45 | 119.50 | No |
6 | 4312-GVYNH | Female | 0 | Yes | No | 70 | No | No phone service | DSL | Yes | ... | Yes | Yes | No | Yes | Two year | Yes | Bank transfer (automatic) | 49.85 | 3370.20 | No |
# You can also use startRow, endRow and other params to define how much data read
data <- read.xlsx("../../data/telecom_users.xlsx", sheetIndex = 1, endRow = 100)
head(data)
customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | ... | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<chr> | <chr> | <dbl> | <chr> | <chr> | <dbl> | <chr> | <chr> | <chr> | <chr> | ... | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <dbl> | <dbl> | <chr> | |
1 | 7010-BRBUU | Male | 0 | Yes | Yes | 72 | Yes | Yes | No | No internet service | ... | No internet service | No internet service | No internet service | No internet service | Two year | No | Credit card (automatic) | 24.10 | 1734.65 | No |
2 | 9688-YGXVR | Female | 0 | No | No | 44 | Yes | No | Fiber optic | No | ... | Yes | No | Yes | No | Month-to-month | Yes | Credit card (automatic) | 88.15 | 3973.20 | No |
3 | 9286-DOJGF | Female | 1 | Yes | No | 38 | Yes | Yes | Fiber optic | No | ... | No | No | No | No | Month-to-month | Yes | Bank transfer (automatic) | 74.95 | 2869.85 | Yes |
4 | 6994-KERXL | Male | 0 | No | No | 4 | Yes | No | DSL | No | ... | No | No | No | Yes | Month-to-month | Yes | Electronic check | 55.90 | 238.50 | No |
5 | 2181-UAESM | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | ... | Yes | No | No | No | Month-to-month | No | Electronic check | 53.45 | 119.50 | No |
6 | 4312-GVYNH | Female | 0 | Yes | No | 70 | No | No phone service | DSL | Yes | ... | Yes | Yes | No | Yes | Two year | Yes | Bank transfer (automatic) | 49.85 | 3370.20 | No |
Let’s replace Churn
values Yes
/No
by 1
/0
:
Write final data to excel:
4.2 Task 1
Download from kaggle.com and read dataset Default_Fin.csv
: https://www.kaggle.com/kmldas/loan-default-prediction
Description:
This is a synthetic dataset created using actual data from a financial institution. The data has been modified to remove identifiable features and the numbers transformed to ensure they do not link to original source (financial institution).
This is intended to be used for academic purposes for beginners who want to practice financial analytics from a simple financial dataset
- Check what columns has missing values
- Count default and non-default clients / and parts of total clients in %
- Count Employed clients
- Count Employed Default clients
- Average salary by Employed clients
- Rename columns to “id”, “empl”, “balance”, “salary”, “default”
Solution for Task 1
Index | Employed | Bank.Balance | Annual.Salary | Defaulted. | |
---|---|---|---|---|---|
<int> | <int> | <dbl> | <dbl> | <int> | |
1 | 1 | 1 | 8754.36 | 532339.56 | 0 |
2 | 2 | 0 | 9806.16 | 145273.56 | 0 |
3 | 3 | 1 | 12882.60 | 381205.68 | 0 |
4 | 4 | 1 | 6351.00 | 428453.88 | 0 |
5 | 5 | 1 | 9427.92 | 461562.00 | 0 |
6 | 6 | 0 | 11035.08 | 89898.72 | 0 |
- Check what columns has missing values
- Count default and non-default clients / and parts of total clients in %
def_count <- nrow(data[data$Defaulted. == 1, ])
no_def_count <- nrow(data[data$Defaulted. == 0, ])
def_count
no_def_count
- Count Employed clients
- Count Employed Default clients
- Average salary by Employed clients
- Rename columns to “id”, “empl”, “balance”, “salary”, “default”:
id | empl | balance | salary | default | |
---|---|---|---|---|---|
<int> | <int> | <dbl> | <dbl> | <int> | |
1 | 1 | 1 | 8754.36 | 532339.56 | 0 |
2 | 2 | 0 | 9806.16 | 145273.56 | 0 |
3 | 3 | 1 | 12882.60 | 381205.68 | 0 |
4 | 4 | 1 | 6351.00 | 428453.88 | 0 |
5 | 5 | 1 | 9427.92 | 461562.00 | 0 |
6 | 6 | 0 | 11035.08 | 89898.72 | 0 |
4.3 Набори даних
- https://github.com/kleban/r-book-published/tree/main/datasets/telecom_users.csv
- https://github.com/kleban/r-book-published/tree/main/datasets/telecom_sers.xlsx
- https://github.com/kleban/r-book-published/tree/main/datasets/Default_Fin.csv
- https://github.com/kleban/r-book-published/tree/main/datasets/employes.xml