6  JSON and API

6.1 What is JSON?

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard.

API is the acronym for Application Programming Interface, which is a software intermediary that allows two applications to talk to each other.

One of the most popular packages for json is jsonlite.

#install.packages("jsonlite")
library(jsonlite)

Let’s use readinginformation about BTC and USDT crypro currencies from Binance

market = 'BTCUSDT'
interval = '1h'
limit = 100

url <- paste0(url = "https://api.binance.com/api/v3/klines?symbol=", market ,"&interval=", interval,"&limit=", limit)
print(url) # complete request URL
[1] "https://api.binance.com/api/v3/klines?symbol=BTCUSDT&interval=1h&limit=100"

On the next stage you need use fromJSON() function to get data.

More details about requests to Binanace at https://github.com/binance/binance-spot-api-docs/blob/master/rest-api.md#klinecandlestick-data

If you enter ‘url’ value at browser response is going to be like this:

[
  [
    1499040000000,      // Open time
    "0.01634790",       // Open
    "0.80000000",       // High
    "0.01575800",       // Low
    "0.01577100",       // Close
    "148976.11427815",  // Volume
    1499644799999,      // Close time
    "2434.19055334",    // Quote asset volume
    308,                // Number of trades
    "1756.87402397",    // Taker buy base asset volume
    "28.46694368",      // Taker buy quote asset volume
    "17928899.62484339" // Ignore.
  ]
]
data <- fromJSON(url) # get json and transform it to list()
data <- data[, 1:7] # let's left only 1:7 columns (from Open time to Close time)
head(data)
A matrix: 6 × 7 of type chr
1650513600000 41693.58000000 41750.00000000 41525.00000000 41610.01000000 1138.64337000 1650517199999
1650517200000 41610.01000000 41699.00000000 41434.44000000 41462.76000000 1229.25936000 1650520799999
1650520800000 41462.75000000 41600.00000000 41419.20000000 41522.38000000 1049.71244000 1650524399999
1650524400000 41522.38000000 41940.00000000 41451.00000000 41855.69000000 1928.48091000 1650527999999
1.650528e+12 41855.69000000 42050.30000000 41741.10000000 41922.97000000 2518.04090000 1650531599999
1650531600000 41922.96000000 41971.90000000 41743.96000000 41803.70000000 1655.76993000 1650535199999
typeof(data) # check data type
data <- as.data.frame(data) # convert to dataframe
head(data)
'character'
A data.frame: 6 × 7
V1 V2 V3 V4 V5 V6 V7
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1650513600000 41693.58000000 41750.00000000 41525.00000000 41610.01000000 1138.64337000 1650517199999
2 1650517200000 41610.01000000 41699.00000000 41434.44000000 41462.76000000 1229.25936000 1650520799999
3 1650520800000 41462.75000000 41600.00000000 41419.20000000 41522.38000000 1049.71244000 1650524399999
4 1650524400000 41522.38000000 41940.00000000 41451.00000000 41855.69000000 1928.48091000 1650527999999
5 1.650528e+12 41855.69000000 42050.30000000 41741.10000000 41922.97000000 2518.04090000 1650531599999
6 1650531600000 41922.96000000 41971.90000000 41743.96000000 41803.70000000 1655.76993000 1650535199999
# fix columns names
colnames(data) <- c("Open_time", "Open", "High", "Low", "Close", "Volume", "Close_time")
head(data) # looks better, but columns are characters still
A data.frame: 6 × 7
Open_time Open High Low Close Volume Close_time
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1650513600000 41693.58000000 41750.00000000 41525.00000000 41610.01000000 1138.64337000 1650517199999
2 1650517200000 41610.01000000 41699.00000000 41434.44000000 41462.76000000 1229.25936000 1650520799999
3 1650520800000 41462.75000000 41600.00000000 41419.20000000 41522.38000000 1049.71244000 1650524399999
4 1650524400000 41522.38000000 41940.00000000 41451.00000000 41855.69000000 1928.48091000 1650527999999
5 1.650528e+12 41855.69000000 42050.30000000 41741.10000000 41922.97000000 2518.04090000 1650531599999
6 1650531600000 41922.96000000 41971.90000000 41743.96000000 41803.70000000 1655.76993000 1650535199999
is.numeric(data[,1]) # check 1st column type is numeric
is.numeric(data[,2]) # check 2nd column type is numeric
FALSE
FALSE
data <- as.data.frame(sapply(data, as.numeric)) # convert all columns to numeric
head(data) # good, its double now
A data.frame: 6 × 7
Open_time Open High Low Close Volume Close_time
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1.650514e+12 41693.58 41750.0 41525.00 41610.01 1138.643 1.650517e+12
2 1.650517e+12 41610.01 41699.0 41434.44 41462.76 1229.259 1.650521e+12
3 1.650521e+12 41462.75 41600.0 41419.20 41522.38 1049.712 1.650524e+12
4 1.650524e+12 41522.38 41940.0 41451.00 41855.69 1928.481 1.650528e+12
5 1.650528e+12 41855.69 42050.3 41741.10 41922.97 2518.041 1.650532e+12
6 1.650532e+12 41922.96 41971.9 41743.96 41803.70 1655.770 1.650535e+12

Final stage is to convert Open_time and Close_time to dates.

data$Open_time <- as.POSIXct(data$Open_time/1e3, origin = '1970-01-01')
data$Close_time <- as.POSIXct(data$Close_time/1e3, origin = '1970-01-01')

head(data) 
A data.frame: 6 × 7
Open_time Open High Low Close Volume Close_time
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
1 2022-04-21 07:00:00 41693.58 41750.0 41525.00 41610.01 1138.643 2022-04-21 07:59:59
2 2022-04-21 08:00:00 41610.01 41699.0 41434.44 41462.76 1229.259 2022-04-21 08:59:59
3 2022-04-21 09:00:00 41462.75 41600.0 41419.20 41522.38 1049.712 2022-04-21 09:59:59
4 2022-04-21 10:00:00 41522.38 41940.0 41451.00 41855.69 1928.481 2022-04-21 10:59:59
5 2022-04-21 11:00:00 41855.69 42050.3 41741.10 41922.97 2518.041 2022-04-21 11:59:59
6 2022-04-21 12:00:00 41922.96 41971.9 41743.96 41803.70 1655.770 2022-04-21 12:59:59
tail(data) # check last records
A data.frame: 6 × 7
Open_time Open High Low Close Volume Close_time
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
95 2022-04-25 05:00:00 39095.81 39153.94 38961.64 39091.17 1205.5158 2022-04-25 05:59:59
96 2022-04-25 06:00:00 39091.17 39294.76 39086.37 39253.71 1443.3318 2022-04-25 06:59:59
97 2022-04-25 07:00:00 39253.70 39256.28 39055.71 39139.74 896.8554 2022-04-25 07:59:59
98 2022-04-25 08:00:00 39139.74 39230.50 38947.42 38975.22 1057.4900 2022-04-25 08:59:59
99 2022-04-25 09:00:00 38975.21 39057.97 38590.00 38636.35 2814.9716 2022-04-25 09:59:59
100 2022-04-25 10:00:00 38636.35 38675.68 38200.00 38534.99 3528.2355 2022-04-25 10:59:59

6.2 Набори даних

  1. https://github.com/kleban/r-book-published/tree/main/datasets/telecom_users.csv
  2. https://github.com/kleban/r-book-published/tree/main/datasets/telecom_sers.xlsx
  3. https://github.com/kleban/r-book-published/tree/main/datasets/Default_Fin.csv
  4. https://github.com/kleban/r-book-published/tree/main/datasets/employes.xml

6.3 References

  1. SQLite in R. Datacamp
  2. Tidyverse googlesheets4 0.2.0
  3. Binanace spot Api Docs
  4. Web Scraping in R: rvest Tutorial by Arvid Kingl