17 Binding rows and columns
author: Юрій Клебан
Before start load packages
17.1 bind_rows
bind_rows(.data, …)
helps to unite two dataframes with the same columns order and names.
So, if we need add one data frame to an other vertically (bind rows) we shoul use bind_rows
:
d2002 <- gapminder %>%
filter(year == 2002) %>% # year
group_by(continent, year) %>% # grouping condition
summarise(
lifeExpAvg = mean(lifeExp),
countriesCount = n(), # n() count of rows in group
.groups = 'drop'
)
head(d2002)
continent | year | lifeExpAvg | countriesCount |
---|---|---|---|
<fct> | <int> | <dbl> | <int> |
Africa | 2002 | 53.32523 | 52 |
Americas | 2002 | 72.42204 | 25 |
Asia | 2002 | 69.23388 | 33 |
Europe | 2002 | 76.70060 | 30 |
Oceania | 2002 | 79.74000 | 2 |
d2007 <- gapminder %>%
filter(year == 2007) %>% # year
group_by(continent, year) %>% # grouping condition
summarise(
lifeExpAvg = mean(lifeExp),
countriesCount = n() # n() count of rows in group
)
head(d2007)
`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.
continent | year | lifeExpAvg | countriesCount |
---|---|---|---|
<fct> | <int> | <dbl> | <int> |
Africa | 2007 | 54.80604 | 52 |
Americas | 2007 | 73.60812 | 25 |
Asia | 2007 | 70.72848 | 33 |
Europe | 2007 | 77.64860 | 30 |
Oceania | 2007 | 80.71950 | 2 |
Unite them:
continent | year | lifeExpAvg | countriesCount |
---|---|---|---|
<fct> | <int> | <dbl> | <int> |
Africa | 2002 | 53.32523 | 52 |
Americas | 2002 | 72.42204 | 25 |
Asia | 2002 | 69.23388 | 33 |
Europe | 2002 | 76.70060 | 30 |
Oceania | 2002 | 79.74000 | 2 |
Africa | 2007 | 54.80604 | 52 |
Americas | 2007 | 73.60812 | 25 |
Asia | 2007 | 70.72848 | 33 |
Europe | 2007 | 77.64860 | 30 |
Oceania | 2007 | 80.71950 | 2 |
17.2 bind_cols
bind_cols(.data, …)
helps to unite two dataframes with the same rows count.
grouped_data2002pop <- gapminder %>%
filter(year == 2002) %>% # year
group_by(continent) %>% # grouping condition
summarise(totalPop = sum(pop)) %>%
mutate(year = 2002)
grouped_data2002pop
continent | totalPop | year |
---|---|---|
<fct> | <dbl> | <dbl> |
Africa | 833723916 | 2002 |
Americas | 849772762 | 2002 |
Asia | 3601802203 | 2002 |
Europe | 578223869 | 2002 |
Oceania | 23454829 | 2002 |
Let’s combine d2002
and grouped_data2002pop
:
grouped_data <- d2002 %>%
bind_cols(grouped_data2002pop)
grouped_data
# columns with the same name were renamed!
New names:
* `continent` -> `continent...1`
* `year` -> `year...2`
* `continent` -> `continent...5`
* `year` -> `year...7`
continent...1 | year...2 | lifeExpAvg | countriesCount | continent...5 | totalPop | year...7 |
---|---|---|---|---|---|---|
<fct> | <int> | <dbl> | <int> | <fct> | <dbl> | <dbl> |
Africa | 2002 | 53.32523 | 52 | Africa | 833723916 | 2002 |
Americas | 2002 | 72.42204 | 25 | Americas | 849772762 | 2002 |
Asia | 2002 | 69.23388 | 33 | Asia | 3601802203 | 2002 |
Europe | 2002 | 76.70060 | 30 | Europe | 578223869 | 2002 |
Oceania | 2002 | 79.74000 | 2 | Oceania | 23454829 | 2002 |
You can remove same named variables before binding:
grouped_data <- d2002 %>%
bind_cols(grouped_data2002pop %>%
select(-continent, -year))
grouped_data
# better, but continents order is not the same in both frames
# your data is going to be damaged
continent | year | lifeExpAvg | countriesCount | totalPop |
---|---|---|---|---|
<fct> | <int> | <dbl> | <int> | <dbl> |
Africa | 2002 | 53.32523 | 52 | 833723916 |
Americas | 2002 | 72.42204 | 25 | 849772762 |
Asia | 2002 | 69.23388 | 33 | 3601802203 |
Europe | 2002 | 76.70060 | 30 | 578223869 |
Oceania | 2002 | 79.74000 | 2 | 23454829 |
grouped_data2002pop <- grouped_data2002pop %>%
arrange(totalPop)
grouped_data <- d2002 %>%
bind_cols(grouped_data2002pop)
grouped_data
# you can see that continent fields different in the same row
New names:
* `continent` -> `continent...1`
* `year` -> `year...2`
* `continent` -> `continent...5`
* `year` -> `year...7`
continent...1 | year...2 | lifeExpAvg | countriesCount | continent...5 | totalPop | year...7 |
---|---|---|---|---|---|---|
<fct> | <int> | <dbl> | <int> | <fct> | <dbl> | <dbl> |
Africa | 2002 | 53.32523 | 52 | Oceania | 23454829 | 2002 |
Americas | 2002 | 72.42204 | 25 | Europe | 578223869 | 2002 |
Asia | 2002 | 69.23388 | 33 | Africa | 833723916 | 2002 |
Europe | 2002 | 76.70060 | 30 | Americas | 849772762 | 2002 |
Oceania | 2002 | 79.74000 | 2 | Asia | 3601802203 | 2002 |
How to solve this? Join
functions issolution.
17.3 Refences
- dplyr: A Grammar of Data Manipulation on https://cran.r-project.org/.
- Data Transformation with splyr::cheat sheet.
- DPLYR TUTORIAL : DATA MANIPULATION (50 EXAMPLES) by Deepanshu Bhalla.
- Dplyr Intro by Stat 545. 6.R Dplyr Tutorial: Data Manipulation(Join) & Cleaning(Spread). Introduction to Data Analysis
- Loan Default Prediction. Beginners data set for financial analytics Kaggle