14  Create new variables with mutate()


author: Юрій Клебан


Before start load packages

library(dplyr) # for demos
#install.packages("gapminder")
library(gapminder)  # load package and dataset

mutate(.data, …) compute new column(s). Lets compute new column for gapminder

\(gdpTotal = gdpPercap * pop / 1000000\).

gapminder |> 
    mutate(gdpTotal = gdpPercap * pop) |>
    head(10)
A tibble: 10 × 7
country continent year lifeExp pop gdpPercap gdpTotal
<fct> <fct> <int> <dbl> <int> <dbl> <dbl>
Afghanistan Asia 1952 28.801 8425333 779.4453 6567086330
Afghanistan Asia 1957 30.332 9240934 820.8530 7585448670
Afghanistan Asia 1962 31.997 10267083 853.1007 8758855797
Afghanistan Asia 1967 34.020 11537966 836.1971 9648014150
Afghanistan Asia 1972 36.088 13079460 739.9811 9678553274
Afghanistan Asia 1977 38.438 14880372 786.1134 11697659231
Afghanistan Asia 1982 39.854 12881816 978.0114 12598563401
Afghanistan Asia 1987 40.822 13867957 852.3959 11820990309
Afghanistan Asia 1992 41.674 16317921 649.3414 10595901589
Afghanistan Asia 1997 41.763 22227415 635.3414 14121995875

transmute(.data, …) compute new column(s), drop others.

gapminder |>
    transmute(gdpTotal = gdpPercap * pop) |>
    head(10)
A tibble: 10 × 1
gdpTotal
<dbl>
6567086330
7585448670
8758855797
9648014150
9678553274
11697659231
12598563401
11820990309
10595901589
14121995875

You can mutate many columns at once:

gapminder |>
    mutate(gdpTotal = gdpPercap * pop,
           countryUpper = toupper(country), # uppercase country
           lifeExpRounded = round(lifeExp)) |>
    head(10)
A tibble: 10 × 9
country continent year lifeExp pop gdpPercap gdpTotal countryUpper lifeExpRounded
<fct> <fct> <int> <dbl> <int> <dbl> <dbl> <chr> <dbl>
Afghanistan Asia 1952 28.801 8425333 779.4453 6567086330 AFGHANISTAN 29
Afghanistan Asia 1957 30.332 9240934 820.8530 7585448670 AFGHANISTAN 30
Afghanistan Asia 1962 31.997 10267083 853.1007 8758855797 AFGHANISTAN 32
Afghanistan Asia 1967 34.020 11537966 836.1971 9648014150 AFGHANISTAN 34
Afghanistan Asia 1972 36.088 13079460 739.9811 9678553274 AFGHANISTAN 36
Afghanistan Asia 1977 38.438 14880372 786.1134 11697659231 AFGHANISTAN 38
Afghanistan Asia 1982 39.854 12881816 978.0114 12598563401 AFGHANISTAN 40
Afghanistan Asia 1987 40.822 13867957 852.3959 11820990309 AFGHANISTAN 41
Afghanistan Asia 1992 41.674 16317921 649.3414 10595901589 AFGHANISTAN 42
Afghanistan Asia 1997 41.763 22227415 635.3414 14121995875 AFGHANISTAN 42

You also can edit existing column (let’s change continent Europe to EU in dataframe):

data2002 <- gapminder |> filter(year == 2002) 
head(data2002)

data2002 |>
    mutate(continent = as.character(continent), # convert factor -> character 
           continent = ifelse(continent == "Europe", "EU", continent))
A tibble: 6 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
Afghanistan Asia 2002 42.129 25268405 726.7341
Albania Europe 2002 75.651 3508512 4604.2117
Algeria Africa 2002 70.994 31287142 5288.0404
Angola Africa 2002 41.003 10866106 2773.2873
Argentina Americas 2002 74.340 38331121 8797.6407
Australia Oceania 2002 80.370 19546792 30687.7547
A tibble: 142 × 6
country continent year lifeExp pop gdpPercap
<fct> <chr> <int> <dbl> <int> <dbl>
Afghanistan Asia 2002 42.129 25268405 726.7341
Albania EU 2002 75.651 3508512 4604.2117
Algeria Africa 2002 70.994 31287142 5288.0404
Angola Africa 2002 41.003 10866106 2773.2873
Argentina Americas 2002 74.340 38331121 8797.6407
Australia Oceania 2002 80.370 19546792 30687.7547
Austria EU 2002 78.980 8148312 32417.6077
Bahrain Asia 2002 74.795 656397 23403.5593
Bangladesh Asia 2002 62.013 135656790 1136.3904
Belgium EU 2002 78.320 10311970 30485.8838
Benin Africa 2002 54.406 7026113 1372.8779
Bolivia Americas 2002 63.883 8445134 3413.2627
Bosnia and Herzegovina EU 2002 74.090 4165416 6018.9752
Botswana Africa 2002 46.634 1630347 11003.6051
Brazil Americas 2002 71.006 179914212 8131.2128
Bulgaria EU 2002 72.140 7661799 7696.7777
Burkina Faso Africa 2002 50.650 12251209 1037.6452
Burundi Africa 2002 47.360 7021078 446.4035
Cambodia Asia 2002 56.752 12926707 896.2260
Cameroon Africa 2002 49.856 15929988 1934.0114
Canada Americas 2002 79.770 31902268 33328.9651
Central African Republic Africa 2002 43.308 4048013 738.6906
Chad Africa 2002 50.525 8835739 1156.1819
Chile Americas 2002 77.860 15497046 10778.7838
China Asia 2002 72.028 1280400000 3119.2809
Colombia Americas 2002 71.682 41008227 5755.2600
Comoros Africa 2002 62.974 614382 1075.8116
Congo, Dem. Rep. Africa 2002 44.966 55379852 241.1659
Congo, Rep. Africa 2002 52.970 3328795 3484.0620
Costa Rica Americas 2002 78.123 3834934 7723.4472
Sierra Leone Africa 2002 41.012 5359092 699.4897
Singapore Asia 2002 78.770 4197776 36023.1054
Slovak Republic EU 2002 73.800 5410052 13638.7784
Slovenia EU 2002 76.660 2011497 20660.0194
Somalia Africa 2002 45.936 7753310 882.0818
South Africa Africa 2002 53.365 44433622 7710.9464
Spain EU 2002 79.780 40152517 24835.4717
Sri Lanka Asia 2002 70.815 19576783 3015.3788
Sudan Africa 2002 56.369 37090298 1993.3983
Swaziland Africa 2002 43.869 1130269 4128.1169
Sweden EU 2002 80.040 8954175 29341.6309
Switzerland EU 2002 80.620 7361757 34480.9577
Syria Asia 2002 73.053 17155814 4090.9253
Taiwan Asia 2002 76.990 22454239 23235.4233
Tanzania Africa 2002 49.651 34593779 899.0742
Thailand Asia 2002 68.564 62806748 5913.1875
Togo Africa 2002 57.561 4977378 886.2206
Trinidad and Tobago Americas 2002 68.976 1101832 11460.6002
Tunisia Africa 2002 73.042 9770575 5722.8957
Turkey EU 2002 70.845 67308928 6508.0857
Uganda Africa 2002 47.813 24739869 927.7210
United Kingdom EU 2002 78.471 59912431 29478.9992
United States Americas 2002 77.310 287675526 39097.0995
Uruguay Americas 2002 75.307 3363085 7727.0020
Venezuela Americas 2002 72.766 24287670 8605.0478
Vietnam Asia 2002 73.017 80908147 1764.4567
West Bank and Gaza Asia 2002 72.370 3389578 4515.4876
Yemen, Rep. Asia 2002 60.308 18701257 2234.8208
Zambia Africa 2002 39.193 10595811 1071.6139
Zimbabwe Africa 2002 39.989 11926563 672.0386

14.1 Refences

  1. dplyr: A Grammar of Data Manipulation on https://cran.r-project.org/.
  2. Data Transformation with splyr::cheat sheet.
  3. DPLYR TUTORIAL : DATA MANIPULATION (50 EXAMPLES) by Deepanshu Bhalla.
  4. Dplyr Intro by Stat 545. 6.R Dplyr Tutorial: Data Manipulation(Join) & Cleaning(Spread). Introduction to Data Analysis
  5. Loan Default Prediction. Beginners data set for financial analytics Kaggle