6 JSON and API
6.1 What is JSON?
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard.
API is the acronym for Application Programming Interface, which is a software intermediary that allows two applications to talk to each other.
One of the most popular packages for json is jsonlite.
Let’s use readinginformation about BTC and USDT crypro currencies from Binance
market = 'BTCUSDT'
interval = '1h'
limit = 100
url <- paste0(url = "https://api.binance.com/api/v3/klines?symbol=", market ,"&interval=", interval,"&limit=", limit)
print(url) # complete request URL[1] "https://api.binance.com/api/v3/klines?symbol=BTCUSDT&interval=1h&limit=100"
On the next stage you need use fromJSON() function to get data.
More details about requests to Binanace at https://github.com/binance/binance-spot-api-docs/blob/master/rest-api.md#klinecandlestick-data
If you enter ‘url’ value at browser response is going to be like this:
[
[
1499040000000, // Open time
"0.01634790", // Open
"0.80000000", // High
"0.01575800", // Low
"0.01577100", // Close
"148976.11427815", // Volume
1499644799999, // Close time
"2434.19055334", // Quote asset volume
308, // Number of trades
"1756.87402397", // Taker buy base asset volume
"28.46694368", // Taker buy quote asset volume
"17928899.62484339" // Ignore.
]
]data <- fromJSON(url) # get json and transform it to list()
data <- data[, 1:7] # let's left only 1:7 columns (from Open time to Close time)
head(data)| 1650513600000 | 41693.58000000 | 41750.00000000 | 41525.00000000 | 41610.01000000 | 1138.64337000 | 1650517199999 |
| 1650517200000 | 41610.01000000 | 41699.00000000 | 41434.44000000 | 41462.76000000 | 1229.25936000 | 1650520799999 |
| 1650520800000 | 41462.75000000 | 41600.00000000 | 41419.20000000 | 41522.38000000 | 1049.71244000 | 1650524399999 |
| 1650524400000 | 41522.38000000 | 41940.00000000 | 41451.00000000 | 41855.69000000 | 1928.48091000 | 1650527999999 |
| 1.650528e+12 | 41855.69000000 | 42050.30000000 | 41741.10000000 | 41922.97000000 | 2518.04090000 | 1650531599999 |
| 1650531600000 | 41922.96000000 | 41971.90000000 | 41743.96000000 | 41803.70000000 | 1655.76993000 | 1650535199999 |
| V1 | V2 | V3 | V4 | V5 | V6 | V7 | |
|---|---|---|---|---|---|---|---|
| <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | |
| 1 | 1650513600000 | 41693.58000000 | 41750.00000000 | 41525.00000000 | 41610.01000000 | 1138.64337000 | 1650517199999 |
| 2 | 1650517200000 | 41610.01000000 | 41699.00000000 | 41434.44000000 | 41462.76000000 | 1229.25936000 | 1650520799999 |
| 3 | 1650520800000 | 41462.75000000 | 41600.00000000 | 41419.20000000 | 41522.38000000 | 1049.71244000 | 1650524399999 |
| 4 | 1650524400000 | 41522.38000000 | 41940.00000000 | 41451.00000000 | 41855.69000000 | 1928.48091000 | 1650527999999 |
| 5 | 1.650528e+12 | 41855.69000000 | 42050.30000000 | 41741.10000000 | 41922.97000000 | 2518.04090000 | 1650531599999 |
| 6 | 1650531600000 | 41922.96000000 | 41971.90000000 | 41743.96000000 | 41803.70000000 | 1655.76993000 | 1650535199999 |
# fix columns names
colnames(data) <- c("Open_time", "Open", "High", "Low", "Close", "Volume", "Close_time")
head(data) # looks better, but columns are characters still| Open_time | Open | High | Low | Close | Volume | Close_time | |
|---|---|---|---|---|---|---|---|
| <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | |
| 1 | 1650513600000 | 41693.58000000 | 41750.00000000 | 41525.00000000 | 41610.01000000 | 1138.64337000 | 1650517199999 |
| 2 | 1650517200000 | 41610.01000000 | 41699.00000000 | 41434.44000000 | 41462.76000000 | 1229.25936000 | 1650520799999 |
| 3 | 1650520800000 | 41462.75000000 | 41600.00000000 | 41419.20000000 | 41522.38000000 | 1049.71244000 | 1650524399999 |
| 4 | 1650524400000 | 41522.38000000 | 41940.00000000 | 41451.00000000 | 41855.69000000 | 1928.48091000 | 1650527999999 |
| 5 | 1.650528e+12 | 41855.69000000 | 42050.30000000 | 41741.10000000 | 41922.97000000 | 2518.04090000 | 1650531599999 |
| 6 | 1650531600000 | 41922.96000000 | 41971.90000000 | 41743.96000000 | 41803.70000000 | 1655.76993000 | 1650535199999 |
is.numeric(data[,1]) # check 1st column type is numeric
is.numeric(data[,2]) # check 2nd column type is numericdata <- as.data.frame(sapply(data, as.numeric)) # convert all columns to numeric
head(data) # good, its double now| Open_time | Open | High | Low | Close | Volume | Close_time | |
|---|---|---|---|---|---|---|---|
| <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
| 1 | 1.650514e+12 | 41693.58 | 41750.0 | 41525.00 | 41610.01 | 1138.643 | 1.650517e+12 |
| 2 | 1.650517e+12 | 41610.01 | 41699.0 | 41434.44 | 41462.76 | 1229.259 | 1.650521e+12 |
| 3 | 1.650521e+12 | 41462.75 | 41600.0 | 41419.20 | 41522.38 | 1049.712 | 1.650524e+12 |
| 4 | 1.650524e+12 | 41522.38 | 41940.0 | 41451.00 | 41855.69 | 1928.481 | 1.650528e+12 |
| 5 | 1.650528e+12 | 41855.69 | 42050.3 | 41741.10 | 41922.97 | 2518.041 | 1.650532e+12 |
| 6 | 1.650532e+12 | 41922.96 | 41971.9 | 41743.96 | 41803.70 | 1655.770 | 1.650535e+12 |
Final stage is to convert Open_time and Close_time to dates.
data$Open_time <- as.POSIXct(data$Open_time/1e3, origin = '1970-01-01')
data$Close_time <- as.POSIXct(data$Close_time/1e3, origin = '1970-01-01')
head(data) | Open_time | Open | High | Low | Close | Volume | Close_time | |
|---|---|---|---|---|---|---|---|
| <dttm> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dttm> | |
| 1 | 2022-04-21 07:00:00 | 41693.58 | 41750.0 | 41525.00 | 41610.01 | 1138.643 | 2022-04-21 07:59:59 |
| 2 | 2022-04-21 08:00:00 | 41610.01 | 41699.0 | 41434.44 | 41462.76 | 1229.259 | 2022-04-21 08:59:59 |
| 3 | 2022-04-21 09:00:00 | 41462.75 | 41600.0 | 41419.20 | 41522.38 | 1049.712 | 2022-04-21 09:59:59 |
| 4 | 2022-04-21 10:00:00 | 41522.38 | 41940.0 | 41451.00 | 41855.69 | 1928.481 | 2022-04-21 10:59:59 |
| 5 | 2022-04-21 11:00:00 | 41855.69 | 42050.3 | 41741.10 | 41922.97 | 2518.041 | 2022-04-21 11:59:59 |
| 6 | 2022-04-21 12:00:00 | 41922.96 | 41971.9 | 41743.96 | 41803.70 | 1655.770 | 2022-04-21 12:59:59 |
| Open_time | Open | High | Low | Close | Volume | Close_time | |
|---|---|---|---|---|---|---|---|
| <dttm> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dttm> | |
| 95 | 2022-04-25 05:00:00 | 39095.81 | 39153.94 | 38961.64 | 39091.17 | 1205.5158 | 2022-04-25 05:59:59 |
| 96 | 2022-04-25 06:00:00 | 39091.17 | 39294.76 | 39086.37 | 39253.71 | 1443.3318 | 2022-04-25 06:59:59 |
| 97 | 2022-04-25 07:00:00 | 39253.70 | 39256.28 | 39055.71 | 39139.74 | 896.8554 | 2022-04-25 07:59:59 |
| 98 | 2022-04-25 08:00:00 | 39139.74 | 39230.50 | 38947.42 | 38975.22 | 1057.4900 | 2022-04-25 08:59:59 |
| 99 | 2022-04-25 09:00:00 | 38975.21 | 39057.97 | 38590.00 | 38636.35 | 2814.9716 | 2022-04-25 09:59:59 |
| 100 | 2022-04-25 10:00:00 | 38636.35 | 38675.68 | 38200.00 | 38534.99 | 3528.2355 | 2022-04-25 10:59:59 |
6.2 Набори даних
- https://github.com/kleban/r-book-published/tree/main/datasets/telecom_users.csv
- https://github.com/kleban/r-book-published/tree/main/datasets/telecom_sers.xlsx
- https://github.com/kleban/r-book-published/tree/main/datasets/Default_Fin.csv
- https://github.com/kleban/r-book-published/tree/main/datasets/employes.xml