5  XML

Автор

Юрій Клебан


You need this packages for code execution:

# install.packages("XML")
invisible(Sys.setlocale("LC_ALL", "Ukrainian"))
invisible(options(warn=-1))

5.1 XML (eXtensible Markup Language)

For our example we will use data from data/employes.xml. File contains records with info:

<RECORDS>
   <EMPLOYEE>
      <ID>1</ID>
      <NAME>Rick</NAME>
      <SALARY>623.3</SALARY>
      <STARTDATE>1/1/2012</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>
   ...
</RECORDS>
#install.packages("XML")
library("XML")
#install.packages("methods")
library("methods")
result <- xmlParse(file = "../../data/employes.xml")
print(result)
<?xml version="1.0"?>
<RECORDS>
  <EMPLOYEE>
    <ID>1</ID>
    <NAME>Rick</NAME>
    <SALARY>623.3</SALARY>
    <STARTDATE>1/1/2012</STARTDATE>
    <DEPT>IT</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>2</ID>
    <NAME>Dan</NAME>
    <SALARY>515.2</SALARY>
    <STARTDATE>9/23/2013</STARTDATE>
    <DEPT>Operations</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>3</ID>
    <NAME>Michelle</NAME>
    <SALARY>611</SALARY>
    <STARTDATE>11/15/2014</STARTDATE>
    <DEPT>IT</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>4</ID>
    <NAME>Ryan</NAME>
    <SALARY>729</SALARY>
    <STARTDATE>5/11/2014</STARTDATE>
    <DEPT>HR</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>5</ID>
    <NAME>Gary</NAME>
    <SALARY>843.25</SALARY>
    <STARTDATE>3/27/2015</STARTDATE>
    <DEPT>Finance</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>6</ID>
    <NAME>Nina</NAME>
    <SALARY>578</SALARY>
    <STARTDATE>5/21/2013</STARTDATE>
    <DEPT>IT</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>7</ID>
    <NAME>Simon</NAME>
    <SALARY>632.8</SALARY>
    <STARTDATE>7/30/2013</STARTDATE>
    <DEPT>Operations</DEPT>
  </EMPLOYEE>
  <EMPLOYEE>
    <ID>8</ID>
    <NAME>Guru</NAME>
    <SALARY>722.5</SALARY>
    <STARTDATE>6/17/2014</STARTDATE>
    <DEPT>Finance</DEPT>
  </EMPLOYEE>
</RECORDS>
 
rootnode <- xmlRoot(result) # reading rootnode of xml document
rootnode[[1]] # reading first record
<EMPLOYEE>
  <ID>1</ID>
  <NAME>Rick</NAME>
  <SALARY>623.3</SALARY>
  <STARTDATE>1/1/2012</STARTDATE>
  <DEPT>IT</DEPT>
</EMPLOYEE> 
rootnode[[1]][[2]] # reading first record in root node and second tag, its <NAME>
<NAME>Rick</NAME> 

For us the best way is to get dataframe:

xmldataframe <- xmlToDataFrame("../../data/employes.xml")
xmldataframe
A data.frame: 8 × 5
ID NAME SALARY STARTDATE DEPT
<chr> <chr> <chr> <chr> <chr>
1 Rick 623.3 1/1/2012 IT
2 Dan 515.2 9/23/2013 Operations
3 Michelle 611 11/15/2014 IT
4 Ryan 729 5/11/2014 HR
5 Gary 843.25 3/27/2015 Finance
6 Nina 578 5/21/2013 IT
7 Simon 632.8 7/30/2013 Operations
8 Guru 722.5 6/17/2014 Finance

5.2 Набори даних

  1. https://github.com/kleban/r-book-published/tree/main/datasets/telecom_users.csv
  2. https://github.com/kleban/r-book-published/tree/main/datasets/telecom_sers.xlsx
  3. https://github.com/kleban/r-book-published/tree/main/datasets/Default_Fin.csv
  4. https://github.com/kleban/r-book-published/tree/main/datasets/employes.xml

5.3 References

  1. SQLite in R. Datacamp
  2. Tidyverse googlesheets4 0.2.0
  3. Binanace spot Api Docs
  4. Web Scraping in R: rvest Tutorial by Arvid Kingl