14 Create new variables with mutate()
author: Юрій Клебан
Before start load packages
mutate(.data, …)
compute new column(s). Lets compute new column for gapminder
\(gdpTotal = gdpPercap * pop / 1000000\).
country | continent | year | lifeExp | pop | gdpPercap | gdpTotal |
---|---|---|---|---|---|---|
<fct> | <fct> | <int> | <dbl> | <int> | <dbl> | <dbl> |
Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 | 6567086330 |
Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 | 7585448670 |
Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 | 8758855797 |
Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 | 9648014150 |
Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811 | 9678553274 |
Afghanistan | Asia | 1977 | 38.438 | 14880372 | 786.1134 | 11697659231 |
Afghanistan | Asia | 1982 | 39.854 | 12881816 | 978.0114 | 12598563401 |
Afghanistan | Asia | 1987 | 40.822 | 13867957 | 852.3959 | 11820990309 |
Afghanistan | Asia | 1992 | 41.674 | 16317921 | 649.3414 | 10595901589 |
Afghanistan | Asia | 1997 | 41.763 | 22227415 | 635.3414 | 14121995875 |
transmute(.data, …)
compute new column(s), drop others.
gdpTotal |
---|
<dbl> |
6567086330 |
7585448670 |
8758855797 |
9648014150 |
9678553274 |
11697659231 |
12598563401 |
11820990309 |
10595901589 |
14121995875 |
You can mutate
many columns at once:
gapminder |>
mutate(gdpTotal = gdpPercap * pop,
countryUpper = toupper(country), # uppercase country
lifeExpRounded = round(lifeExp)) |>
head(10)
country | continent | year | lifeExp | pop | gdpPercap | gdpTotal | countryUpper | lifeExpRounded |
---|---|---|---|---|---|---|---|---|
<fct> | <fct> | <int> | <dbl> | <int> | <dbl> | <dbl> | <chr> | <dbl> |
Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 | 6567086330 | AFGHANISTAN | 29 |
Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 | 7585448670 | AFGHANISTAN | 30 |
Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 | 8758855797 | AFGHANISTAN | 32 |
Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 | 9648014150 | AFGHANISTAN | 34 |
Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811 | 9678553274 | AFGHANISTAN | 36 |
Afghanistan | Asia | 1977 | 38.438 | 14880372 | 786.1134 | 11697659231 | AFGHANISTAN | 38 |
Afghanistan | Asia | 1982 | 39.854 | 12881816 | 978.0114 | 12598563401 | AFGHANISTAN | 40 |
Afghanistan | Asia | 1987 | 40.822 | 13867957 | 852.3959 | 11820990309 | AFGHANISTAN | 41 |
Afghanistan | Asia | 1992 | 41.674 | 16317921 | 649.3414 | 10595901589 | AFGHANISTAN | 42 |
Afghanistan | Asia | 1997 | 41.763 | 22227415 | 635.3414 | 14121995875 | AFGHANISTAN | 42 |
You also can edit existing column (let’s change continent Europe
to EU
in dataframe):
data2002 <- gapminder |> filter(year == 2002)
head(data2002)
data2002 |>
mutate(continent = as.character(continent), # convert factor -> character
continent = ifelse(continent == "Europe", "EU", continent))
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
<fct> | <fct> | <int> | <dbl> | <int> | <dbl> |
Afghanistan | Asia | 2002 | 42.129 | 25268405 | 726.7341 |
Albania | Europe | 2002 | 75.651 | 3508512 | 4604.2117 |
Algeria | Africa | 2002 | 70.994 | 31287142 | 5288.0404 |
Angola | Africa | 2002 | 41.003 | 10866106 | 2773.2873 |
Argentina | Americas | 2002 | 74.340 | 38331121 | 8797.6407 |
Australia | Oceania | 2002 | 80.370 | 19546792 | 30687.7547 |
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
<fct> | <chr> | <int> | <dbl> | <int> | <dbl> |
Afghanistan | Asia | 2002 | 42.129 | 25268405 | 726.7341 |
Albania | EU | 2002 | 75.651 | 3508512 | 4604.2117 |
Algeria | Africa | 2002 | 70.994 | 31287142 | 5288.0404 |
Angola | Africa | 2002 | 41.003 | 10866106 | 2773.2873 |
Argentina | Americas | 2002 | 74.340 | 38331121 | 8797.6407 |
Australia | Oceania | 2002 | 80.370 | 19546792 | 30687.7547 |
Austria | EU | 2002 | 78.980 | 8148312 | 32417.6077 |
Bahrain | Asia | 2002 | 74.795 | 656397 | 23403.5593 |
Bangladesh | Asia | 2002 | 62.013 | 135656790 | 1136.3904 |
Belgium | EU | 2002 | 78.320 | 10311970 | 30485.8838 |
Benin | Africa | 2002 | 54.406 | 7026113 | 1372.8779 |
Bolivia | Americas | 2002 | 63.883 | 8445134 | 3413.2627 |
Bosnia and Herzegovina | EU | 2002 | 74.090 | 4165416 | 6018.9752 |
Botswana | Africa | 2002 | 46.634 | 1630347 | 11003.6051 |
Brazil | Americas | 2002 | 71.006 | 179914212 | 8131.2128 |
Bulgaria | EU | 2002 | 72.140 | 7661799 | 7696.7777 |
Burkina Faso | Africa | 2002 | 50.650 | 12251209 | 1037.6452 |
Burundi | Africa | 2002 | 47.360 | 7021078 | 446.4035 |
Cambodia | Asia | 2002 | 56.752 | 12926707 | 896.2260 |
Cameroon | Africa | 2002 | 49.856 | 15929988 | 1934.0114 |
Canada | Americas | 2002 | 79.770 | 31902268 | 33328.9651 |
Central African Republic | Africa | 2002 | 43.308 | 4048013 | 738.6906 |
Chad | Africa | 2002 | 50.525 | 8835739 | 1156.1819 |
Chile | Americas | 2002 | 77.860 | 15497046 | 10778.7838 |
China | Asia | 2002 | 72.028 | 1280400000 | 3119.2809 |
Colombia | Americas | 2002 | 71.682 | 41008227 | 5755.2600 |
Comoros | Africa | 2002 | 62.974 | 614382 | 1075.8116 |
Congo, Dem. Rep. | Africa | 2002 | 44.966 | 55379852 | 241.1659 |
Congo, Rep. | Africa | 2002 | 52.970 | 3328795 | 3484.0620 |
Costa Rica | Americas | 2002 | 78.123 | 3834934 | 7723.4472 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
Sierra Leone | Africa | 2002 | 41.012 | 5359092 | 699.4897 |
Singapore | Asia | 2002 | 78.770 | 4197776 | 36023.1054 |
Slovak Republic | EU | 2002 | 73.800 | 5410052 | 13638.7784 |
Slovenia | EU | 2002 | 76.660 | 2011497 | 20660.0194 |
Somalia | Africa | 2002 | 45.936 | 7753310 | 882.0818 |
South Africa | Africa | 2002 | 53.365 | 44433622 | 7710.9464 |
Spain | EU | 2002 | 79.780 | 40152517 | 24835.4717 |
Sri Lanka | Asia | 2002 | 70.815 | 19576783 | 3015.3788 |
Sudan | Africa | 2002 | 56.369 | 37090298 | 1993.3983 |
Swaziland | Africa | 2002 | 43.869 | 1130269 | 4128.1169 |
Sweden | EU | 2002 | 80.040 | 8954175 | 29341.6309 |
Switzerland | EU | 2002 | 80.620 | 7361757 | 34480.9577 |
Syria | Asia | 2002 | 73.053 | 17155814 | 4090.9253 |
Taiwan | Asia | 2002 | 76.990 | 22454239 | 23235.4233 |
Tanzania | Africa | 2002 | 49.651 | 34593779 | 899.0742 |
Thailand | Asia | 2002 | 68.564 | 62806748 | 5913.1875 |
Togo | Africa | 2002 | 57.561 | 4977378 | 886.2206 |
Trinidad and Tobago | Americas | 2002 | 68.976 | 1101832 | 11460.6002 |
Tunisia | Africa | 2002 | 73.042 | 9770575 | 5722.8957 |
Turkey | EU | 2002 | 70.845 | 67308928 | 6508.0857 |
Uganda | Africa | 2002 | 47.813 | 24739869 | 927.7210 |
United Kingdom | EU | 2002 | 78.471 | 59912431 | 29478.9992 |
United States | Americas | 2002 | 77.310 | 287675526 | 39097.0995 |
Uruguay | Americas | 2002 | 75.307 | 3363085 | 7727.0020 |
Venezuela | Americas | 2002 | 72.766 | 24287670 | 8605.0478 |
Vietnam | Asia | 2002 | 73.017 | 80908147 | 1764.4567 |
West Bank and Gaza | Asia | 2002 | 72.370 | 3389578 | 4515.4876 |
Yemen, Rep. | Asia | 2002 | 60.308 | 18701257 | 2234.8208 |
Zambia | Africa | 2002 | 39.193 | 10595811 | 1071.6139 |
Zimbabwe | Africa | 2002 | 39.989 | 11926563 | 672.0386 |
14.1 Refences
- dplyr: A Grammar of Data Manipulation on https://cran.r-project.org/.
- Data Transformation with splyr::cheat sheet.
- DPLYR TUTORIAL : DATA MANIPULATION (50 EXAMPLES) by Deepanshu Bhalla.
- Dplyr Intro by Stat 545. 6.R Dplyr Tutorial: Data Manipulation(Join) & Cleaning(Spread). Introduction to Data Analysis
- Loan Default Prediction. Beginners data set for financial analytics Kaggle