17 Binding rows and columns

author: Юрій Клебан

Before start load packages

library(dplyr) # for demos
#install.packages("gapminder")
library(gapminder)  # load package and dataset

17.1 `bind_rows`

bind_rows(.data, …) helps to unite two dataframes with the same columns order and names.

So, if we need add one data frame to an other vertically (bind rows) we shoul use bind_rows:

d2002 <- gapminder %>%
            filter(year == 2002) %>% # year
            group_by(continent, year) %>% # grouping condition
            summarise(
                lifeExpAvg = mean(lifeExp),
                countriesCount = n(), # n() count of rows in group 
                .groups = 'drop'
            )
head(d2002)

A tibble: 5 × 4
continent	year	lifeExpAvg	countriesCount
<fct>	<int>	<dbl>	<int>
Africa	2002	53.32523	52
Americas	2002	72.42204	25
Asia	2002	69.23388	33
Europe	2002	76.70060	30
Oceania	2002	79.74000	2

d2007 <- gapminder %>%
            filter(year == 2007) %>% # year
            group_by(continent, year) %>% # grouping condition
            summarise(
                lifeExpAvg = mean(lifeExp),
                countriesCount = n() # n() count of rows in group                
            )
head(d2007)

`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.

A grouped_df: 5 × 4
continent	year	lifeExpAvg	countriesCount
<fct>	<int>	<dbl>	<int>
Africa	2007	54.80604	52
Americas	2007	73.60812	25
Asia	2007	70.72848	33
Europe	2007	77.64860	30
Oceania	2007	80.71950	2

Unite them:

d2002 %>% bind_rows(d2007) ## bind rows

A tibble: 10 × 4
continent	year	lifeExpAvg	countriesCount
<fct>	<int>	<dbl>	<int>
Africa	2002	53.32523	52
Americas	2002	72.42204	25
Asia	2002	69.23388	33
Europe	2002	76.70060	30
Oceania	2002	79.74000	2
Africa	2007	54.80604	52
Americas	2007	73.60812	25
Asia	2007	70.72848	33
Europe	2007	77.64860	30
Oceania	2007	80.71950	2

17.2 `bind_cols`

bind_cols(.data, …) helps to unite two dataframes with the same rows count.

grouped_data2002pop <- gapminder %>%
    filter(year == 2002) %>% # year
    group_by(continent) %>% # grouping condition
    summarise(totalPop = sum(pop)) %>%
    mutate(year = 2002)
grouped_data2002pop

A tibble: 5 × 3
continent	totalPop	year
<fct>	<dbl>	<dbl>
Africa	833723916	2002
Americas	849772762	2002
Asia	3601802203	2002
Europe	578223869	2002
Oceania	23454829	2002

Let’s combine d2002 and grouped_data2002pop:

grouped_data <- d2002 %>% 
    bind_cols(grouped_data2002pop)
grouped_data

# columns with the same name were renamed!

New names:
* `continent` -> `continent...1`
* `year` -> `year...2`
* `continent` -> `continent...5`
* `year` -> `year...7`

A tibble: 5 × 7
continent...1	year...2	lifeExpAvg	countriesCount	continent...5	totalPop	year...7
<fct>	<int>	<dbl>	<int>	<fct>	<dbl>	<dbl>
Africa	2002	53.32523	52	Africa	833723916	2002
Americas	2002	72.42204	25	Americas	849772762	2002
Asia	2002	69.23388	33	Asia	3601802203	2002
Europe	2002	76.70060	30	Europe	578223869	2002
Oceania	2002	79.74000	2	Oceania	23454829	2002

You can remove same named variables before binding:

grouped_data <- d2002 %>% 
    bind_cols(grouped_data2002pop %>%
              select(-continent, -year))
grouped_data

# better, but continents order is not the same in both frames 
# your data is going to be damaged

A tibble: 5 × 5
continent	year	lifeExpAvg	countriesCount	totalPop
<fct>	<int>	<dbl>	<int>	<dbl>
Africa	2002	53.32523	52	833723916
Americas	2002	72.42204	25	849772762
Asia	2002	69.23388	33	3601802203
Europe	2002	76.70060	30	578223869
Oceania	2002	79.74000	2	23454829

grouped_data2002pop <- grouped_data2002pop %>% 
    arrange(totalPop)

grouped_data <- d2002 %>% 
    bind_cols(grouped_data2002pop)
grouped_data

# you can see that continent fields different in the same row

New names:
* `continent` -> `continent...1`
* `year` -> `year...2`
* `continent` -> `continent...5`
* `year` -> `year...7`

A tibble: 5 × 7
continent...1	year...2	lifeExpAvg	countriesCount	continent...5	totalPop	year...7
<fct>	<int>	<dbl>	<int>	<fct>	<dbl>	<dbl>
Africa	2002	53.32523	52	Oceania	23454829	2002
Americas	2002	72.42204	25	Europe	578223869	2002
Asia	2002	69.23388	33	Africa	833723916	2002
Europe	2002	76.70060	30	Americas	849772762	2002
Oceania	2002	79.74000	2	Asia	3601802203	2002

How to solve this? Join functions issolution.

17.3 Refences

dplyr: A Grammar of Data Manipulation on https://cran.r-project.org/.
Data Transformation with splyr::cheat sheet.
DPLYR TUTORIAL : DATA MANIPULATION (50 EXAMPLES) by Deepanshu Bhalla.
Dplyr Intro by Stat 545. 6.R Dplyr Tutorial: Data Manipulation(Join) & Cleaning(Spread). Introduction to Data Analysis
Loan Default Prediction. Beginners data set for financial analytics Kaggle

17.1 bind_rows

17.2 bind_cols

17.3 Refences

17.1 `bind_rows`

17.2 `bind_cols`