17  Binding rows and columns


author: Юрій Клебан


Before start load packages

library(dplyr) # for demos
#install.packages("gapminder")
library(gapminder)  # load package and dataset

17.1 bind_rows

bind_rows(.data, …) helps to unite two dataframes with the same columns order and names.

So, if we need add one data frame to an other vertically (bind rows) we shoul use bind_rows:

d2002 <- gapminder %>%
            filter(year == 2002) %>% # year
            group_by(continent, year) %>% # grouping condition
            summarise(
                lifeExpAvg = mean(lifeExp),
                countriesCount = n(), # n() count of rows in group 
                .groups = 'drop'
            )
head(d2002)
A tibble: 5 × 4
continent year lifeExpAvg countriesCount
<fct> <int> <dbl> <int>
Africa 2002 53.32523 52
Americas 2002 72.42204 25
Asia 2002 69.23388 33
Europe 2002 76.70060 30
Oceania 2002 79.74000 2
d2007 <- gapminder %>%
            filter(year == 2007) %>% # year
            group_by(continent, year) %>% # grouping condition
            summarise(
                lifeExpAvg = mean(lifeExp),
                countriesCount = n() # n() count of rows in group                
            )
head(d2007)
`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.
A grouped_df: 5 × 4
continent year lifeExpAvg countriesCount
<fct> <int> <dbl> <int>
Africa 2007 54.80604 52
Americas 2007 73.60812 25
Asia 2007 70.72848 33
Europe 2007 77.64860 30
Oceania 2007 80.71950 2

Unite them:

d2002 %>% bind_rows(d2007) ## bind rows
A tibble: 10 × 4
continent year lifeExpAvg countriesCount
<fct> <int> <dbl> <int>
Africa 2002 53.32523 52
Americas 2002 72.42204 25
Asia 2002 69.23388 33
Europe 2002 76.70060 30
Oceania 2002 79.74000 2
Africa 2007 54.80604 52
Americas 2007 73.60812 25
Asia 2007 70.72848 33
Europe 2007 77.64860 30
Oceania 2007 80.71950 2

17.2 bind_cols

bind_cols(.data, …) helps to unite two dataframes with the same rows count.

grouped_data2002pop <- gapminder %>%
    filter(year == 2002) %>% # year
    group_by(continent) %>% # grouping condition
    summarise(totalPop = sum(pop)) %>%
    mutate(year = 2002)
grouped_data2002pop
A tibble: 5 × 3
continent totalPop year
<fct> <dbl> <dbl>
Africa 833723916 2002
Americas 849772762 2002
Asia 3601802203 2002
Europe 578223869 2002
Oceania 23454829 2002

Let’s combine d2002 and grouped_data2002pop:

grouped_data <- d2002 %>% 
    bind_cols(grouped_data2002pop)
grouped_data

# columns with the same name were renamed!
New names:
* `continent` -> `continent...1`
* `year` -> `year...2`
* `continent` -> `continent...5`
* `year` -> `year...7`
A tibble: 5 × 7
continent...1 year...2 lifeExpAvg countriesCount continent...5 totalPop year...7
<fct> <int> <dbl> <int> <fct> <dbl> <dbl>
Africa 2002 53.32523 52 Africa 833723916 2002
Americas 2002 72.42204 25 Americas 849772762 2002
Asia 2002 69.23388 33 Asia 3601802203 2002
Europe 2002 76.70060 30 Europe 578223869 2002
Oceania 2002 79.74000 2 Oceania 23454829 2002

You can remove same named variables before binding:

grouped_data <- d2002 %>% 
    bind_cols(grouped_data2002pop %>%
              select(-continent, -year))
grouped_data

# better, but continents order is not the same in both frames 
# your data is going to be damaged
A tibble: 5 × 5
continent year lifeExpAvg countriesCount totalPop
<fct> <int> <dbl> <int> <dbl>
Africa 2002 53.32523 52 833723916
Americas 2002 72.42204 25 849772762
Asia 2002 69.23388 33 3601802203
Europe 2002 76.70060 30 578223869
Oceania 2002 79.74000 2 23454829
grouped_data2002pop <- grouped_data2002pop %>% 
    arrange(totalPop)

grouped_data <- d2002 %>% 
    bind_cols(grouped_data2002pop)
grouped_data

# you can see that continent fields different in the same row
New names:
* `continent` -> `continent...1`
* `year` -> `year...2`
* `continent` -> `continent...5`
* `year` -> `year...7`
A tibble: 5 × 7
continent...1 year...2 lifeExpAvg countriesCount continent...5 totalPop year...7
<fct> <int> <dbl> <int> <fct> <dbl> <dbl>
Africa 2002 53.32523 52 Oceania 23454829 2002
Americas 2002 72.42204 25 Europe 578223869 2002
Asia 2002 69.23388 33 Africa 833723916 2002
Europe 2002 76.70060 30 Americas 849772762 2002
Oceania 2002 79.74000 2 Asia 3601802203 2002

How to solve this? Join functions issolution.


17.3 Refences

  1. dplyr: A Grammar of Data Manipulation on https://cran.r-project.org/.
  2. Data Transformation with splyr::cheat sheet.
  3. DPLYR TUTORIAL : DATA MANIPULATION (50 EXAMPLES) by Deepanshu Bhalla.
  4. Dplyr Intro by Stat 545. 6.R Dplyr Tutorial: Data Manipulation(Join) & Cleaning(Spread). Introduction to Data Analysis
  5. Loan Default Prediction. Beginners data set for financial analytics Kaggle