Factors in R programming allow you to represent a vector of values as categorical values, rather than just a set of text data or numbers. The advantage of the categorical data type is that the element can take only a limited number of values, and not any value that allows the data type.
For example, a numeric vector may contain an infinitely large variation of the values c(1, 0.021, 192.1444, ..., etc.), the character sets may also be different c ("sdf & Tg6","sdf * Y & 65"). The number of combinations of such vectors is very large.
In the case of categories, we are talking about certain fixed values. A good example is forms that are filled out on sites with drop-down lists, where the user cannot enter a value, but only select from an existing list. So in the gender field there is usually a limited set of possible options: Male,Female, Other. The user can select only one of these values and does not have the ability to enter something else (this is an example, each resource can make different forms for users).
Creation of factors in R occurs by means of function factor():
When creating a factor, each unique element gets its own ** _ digital? _ ** _ (from the outside it looks like this, it needs to be clarified) _ value inside the collection, this value is called the level (level). In the previous example, the variable gender _factor received the levels Female, Male, Other in alphabetical order. If we convert factors to numbers, we get:
as.numeric(gender_factor)
2
1
3
2
1
2
1
1
gender Thus it is clear that Female = 1, Male = 2, Other = 3. Consider a situation where we get data in which the order of values in the factor collection is different, for example, we need to specify so that Male = 1, Female = 2, Other = 3:
Now the order of the levels corresponds to ours and this will allow us to successfully combine our collection with similar ones that have the same set of values.
Sometimes it is necessary to change not only the order of the elements in the factorial collection, but also their names. Let’s consider a situation when we need to rename values Male,Female, Other inM, F,O:
But you should check you type with is.factor() before converting to numbers:
cities <-c("Rivne", "Ostroh", "Zdolbuniv", "Dubno", "Sarny")cities_as_factors <-factor(cities)as.numeric(cities_as_factors)as.numeric(cities) # you cannot convert characters vector to numerics
3
2
5
1
4
<NA>
<NA>
<NA>
<NA>
<NA>
16.2 References
The Comprehensive R Archive NetworkRcran: Url: https://cran.r-project.org/
RStudio official website. Url: https://rstudio.com/
Anaconda official website. Url: https://www.anaconda.com/
Introduction to R. Datacamp interactive course. Url: https://www.datacamp.com/courses/free-introduction-to-r
Quanargo. Introduction to R. Url: https://www.quantargo.com/courses/course-r-introduction
R Coder Project. Begin your data science career with R language! Url: https://r-coder.com/
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.URL https://www.R-project.org/.
A.B. Shipunov, EM Baldin, P.A. Volkova, VG Sufiyanov. Visual statistics. We use R! - M .: DMK Press, 2012. - 298 p .: ill.
An Introduction to R. URL: https://cran.r-project.org/doc/manuals/r-release/R-intro.html
R programming. https://www.datamentor.io/r-programming
Learn R. R Functions. https://www.w3schools.com/r/r_functions.asp
UC Business Analytics R Programming Guide. Managing Data Frames. http://uc-r.github.io/dataframes
Learn R programming. R - Lists. https://www.tutorialspoint.com/r/r_lists.htm
Tutorial on the R Apply Family by Carlo Fanara. https://www.datacamp.com/community/tutorials/r-tutorial-apply-family