6 JSON and API
6.1 What is JSON?
JSON
(JavaScript Object Notation
) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard.
API
is the acronym for Application Programming Interface
, which is a software intermediary that allows two applications to talk to each other.
One of the most popular packages for json
is jsonlite
.
Let’s use readinginformation about BTC and USDT crypro currencies from Binance
market = 'BTCUSDT'
interval = '1h'
limit = 100
url <- paste0(url = "https://api.binance.com/api/v3/klines?symbol=", market ,"&interval=", interval,"&limit=", limit)
print(url) # complete request URL
[1] "https://api.binance.com/api/v3/klines?symbol=BTCUSDT&interval=1h&limit=100"
On the next stage you need use fromJSON() function to get data.
More details about requests to Binanace at https://github.com/binance/binance-spot-api-docs/blob/master/rest-api.md#klinecandlestick-data
If you enter ‘url’ value at browser response is going to be like this:
[
[
1499040000000, // Open time
"0.01634790", // Open
"0.80000000", // High
"0.01575800", // Low
"0.01577100", // Close
"148976.11427815", // Volume
1499644799999, // Close time
"2434.19055334", // Quote asset volume
308, // Number of trades
"1756.87402397", // Taker buy base asset volume
"28.46694368", // Taker buy quote asset volume
"17928899.62484339" // Ignore.
]
]
data <- fromJSON(url) # get json and transform it to list()
data <- data[, 1:7] # let's left only 1:7 columns (from Open time to Close time)
head(data)
1650513600000 | 41693.58000000 | 41750.00000000 | 41525.00000000 | 41610.01000000 | 1138.64337000 | 1650517199999 |
1650517200000 | 41610.01000000 | 41699.00000000 | 41434.44000000 | 41462.76000000 | 1229.25936000 | 1650520799999 |
1650520800000 | 41462.75000000 | 41600.00000000 | 41419.20000000 | 41522.38000000 | 1049.71244000 | 1650524399999 |
1650524400000 | 41522.38000000 | 41940.00000000 | 41451.00000000 | 41855.69000000 | 1928.48091000 | 1650527999999 |
1.650528e+12 | 41855.69000000 | 42050.30000000 | 41741.10000000 | 41922.97000000 | 2518.04090000 | 1650531599999 |
1650531600000 | 41922.96000000 | 41971.90000000 | 41743.96000000 | 41803.70000000 | 1655.76993000 | 1650535199999 |
V1 | V2 | V3 | V4 | V5 | V6 | V7 | |
---|---|---|---|---|---|---|---|
<chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | |
1 | 1650513600000 | 41693.58000000 | 41750.00000000 | 41525.00000000 | 41610.01000000 | 1138.64337000 | 1650517199999 |
2 | 1650517200000 | 41610.01000000 | 41699.00000000 | 41434.44000000 | 41462.76000000 | 1229.25936000 | 1650520799999 |
3 | 1650520800000 | 41462.75000000 | 41600.00000000 | 41419.20000000 | 41522.38000000 | 1049.71244000 | 1650524399999 |
4 | 1650524400000 | 41522.38000000 | 41940.00000000 | 41451.00000000 | 41855.69000000 | 1928.48091000 | 1650527999999 |
5 | 1.650528e+12 | 41855.69000000 | 42050.30000000 | 41741.10000000 | 41922.97000000 | 2518.04090000 | 1650531599999 |
6 | 1650531600000 | 41922.96000000 | 41971.90000000 | 41743.96000000 | 41803.70000000 | 1655.76993000 | 1650535199999 |
# fix columns names
colnames(data) <- c("Open_time", "Open", "High", "Low", "Close", "Volume", "Close_time")
head(data) # looks better, but columns are characters still
Open_time | Open | High | Low | Close | Volume | Close_time | |
---|---|---|---|---|---|---|---|
<chr> | <chr> | <chr> | <chr> | <chr> | <chr> | <chr> | |
1 | 1650513600000 | 41693.58000000 | 41750.00000000 | 41525.00000000 | 41610.01000000 | 1138.64337000 | 1650517199999 |
2 | 1650517200000 | 41610.01000000 | 41699.00000000 | 41434.44000000 | 41462.76000000 | 1229.25936000 | 1650520799999 |
3 | 1650520800000 | 41462.75000000 | 41600.00000000 | 41419.20000000 | 41522.38000000 | 1049.71244000 | 1650524399999 |
4 | 1650524400000 | 41522.38000000 | 41940.00000000 | 41451.00000000 | 41855.69000000 | 1928.48091000 | 1650527999999 |
5 | 1.650528e+12 | 41855.69000000 | 42050.30000000 | 41741.10000000 | 41922.97000000 | 2518.04090000 | 1650531599999 |
6 | 1650531600000 | 41922.96000000 | 41971.90000000 | 41743.96000000 | 41803.70000000 | 1655.76993000 | 1650535199999 |
is.numeric(data[,1]) # check 1st column type is numeric
is.numeric(data[,2]) # check 2nd column type is numeric
data <- as.data.frame(sapply(data, as.numeric)) # convert all columns to numeric
head(data) # good, its double now
Open_time | Open | High | Low | Close | Volume | Close_time | |
---|---|---|---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | |
1 | 1.650514e+12 | 41693.58 | 41750.0 | 41525.00 | 41610.01 | 1138.643 | 1.650517e+12 |
2 | 1.650517e+12 | 41610.01 | 41699.0 | 41434.44 | 41462.76 | 1229.259 | 1.650521e+12 |
3 | 1.650521e+12 | 41462.75 | 41600.0 | 41419.20 | 41522.38 | 1049.712 | 1.650524e+12 |
4 | 1.650524e+12 | 41522.38 | 41940.0 | 41451.00 | 41855.69 | 1928.481 | 1.650528e+12 |
5 | 1.650528e+12 | 41855.69 | 42050.3 | 41741.10 | 41922.97 | 2518.041 | 1.650532e+12 |
6 | 1.650532e+12 | 41922.96 | 41971.9 | 41743.96 | 41803.70 | 1655.770 | 1.650535e+12 |
Final stage is to convert Open_time
and Close_time
to dates.
data$Open_time <- as.POSIXct(data$Open_time/1e3, origin = '1970-01-01')
data$Close_time <- as.POSIXct(data$Close_time/1e3, origin = '1970-01-01')
head(data)
Open_time | Open | High | Low | Close | Volume | Close_time | |
---|---|---|---|---|---|---|---|
<dttm> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dttm> | |
1 | 2022-04-21 07:00:00 | 41693.58 | 41750.0 | 41525.00 | 41610.01 | 1138.643 | 2022-04-21 07:59:59 |
2 | 2022-04-21 08:00:00 | 41610.01 | 41699.0 | 41434.44 | 41462.76 | 1229.259 | 2022-04-21 08:59:59 |
3 | 2022-04-21 09:00:00 | 41462.75 | 41600.0 | 41419.20 | 41522.38 | 1049.712 | 2022-04-21 09:59:59 |
4 | 2022-04-21 10:00:00 | 41522.38 | 41940.0 | 41451.00 | 41855.69 | 1928.481 | 2022-04-21 10:59:59 |
5 | 2022-04-21 11:00:00 | 41855.69 | 42050.3 | 41741.10 | 41922.97 | 2518.041 | 2022-04-21 11:59:59 |
6 | 2022-04-21 12:00:00 | 41922.96 | 41971.9 | 41743.96 | 41803.70 | 1655.770 | 2022-04-21 12:59:59 |
Open_time | Open | High | Low | Close | Volume | Close_time | |
---|---|---|---|---|---|---|---|
<dttm> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <dttm> | |
95 | 2022-04-25 05:00:00 | 39095.81 | 39153.94 | 38961.64 | 39091.17 | 1205.5158 | 2022-04-25 05:59:59 |
96 | 2022-04-25 06:00:00 | 39091.17 | 39294.76 | 39086.37 | 39253.71 | 1443.3318 | 2022-04-25 06:59:59 |
97 | 2022-04-25 07:00:00 | 39253.70 | 39256.28 | 39055.71 | 39139.74 | 896.8554 | 2022-04-25 07:59:59 |
98 | 2022-04-25 08:00:00 | 39139.74 | 39230.50 | 38947.42 | 38975.22 | 1057.4900 | 2022-04-25 08:59:59 |
99 | 2022-04-25 09:00:00 | 38975.21 | 39057.97 | 38590.00 | 38636.35 | 2814.9716 | 2022-04-25 09:59:59 |
100 | 2022-04-25 10:00:00 | 38636.35 | 38675.68 | 38200.00 | 38534.99 | 3528.2355 | 2022-04-25 10:59:59 |
6.2 Набори даних
- https://github.com/kleban/r-book-published/tree/main/datasets/telecom_users.csv
- https://github.com/kleban/r-book-published/tree/main/datasets/telecom_sers.xlsx
- https://github.com/kleban/r-book-published/tree/main/datasets/Default_Fin.csv
- https://github.com/kleban/r-book-published/tree/main/datasets/employes.xml