17 Binding rows and columns
author: Юрій Клебан
Before start load packages
17.1 bind_rows
bind_rows(.data, …) helps to unite two dataframes with the same columns order and names.
So, if we need add one data frame to an other vertically (bind rows) we shoul use bind_rows:
d2002 <- gapminder %>%
filter(year == 2002) %>% # year
group_by(continent, year) %>% # grouping condition
summarise(
lifeExpAvg = mean(lifeExp),
countriesCount = n(), # n() count of rows in group
.groups = 'drop'
)
head(d2002)| continent | year | lifeExpAvg | countriesCount |
|---|---|---|---|
| <fct> | <int> | <dbl> | <int> |
| Africa | 2002 | 53.32523 | 52 |
| Americas | 2002 | 72.42204 | 25 |
| Asia | 2002 | 69.23388 | 33 |
| Europe | 2002 | 76.70060 | 30 |
| Oceania | 2002 | 79.74000 | 2 |
d2007 <- gapminder %>%
filter(year == 2007) %>% # year
group_by(continent, year) %>% # grouping condition
summarise(
lifeExpAvg = mean(lifeExp),
countriesCount = n() # n() count of rows in group
)
head(d2007)`summarise()` has grouped output by 'continent'. You can override using the `.groups` argument.
| continent | year | lifeExpAvg | countriesCount |
|---|---|---|---|
| <fct> | <int> | <dbl> | <int> |
| Africa | 2007 | 54.80604 | 52 |
| Americas | 2007 | 73.60812 | 25 |
| Asia | 2007 | 70.72848 | 33 |
| Europe | 2007 | 77.64860 | 30 |
| Oceania | 2007 | 80.71950 | 2 |
Unite them:
| continent | year | lifeExpAvg | countriesCount |
|---|---|---|---|
| <fct> | <int> | <dbl> | <int> |
| Africa | 2002 | 53.32523 | 52 |
| Americas | 2002 | 72.42204 | 25 |
| Asia | 2002 | 69.23388 | 33 |
| Europe | 2002 | 76.70060 | 30 |
| Oceania | 2002 | 79.74000 | 2 |
| Africa | 2007 | 54.80604 | 52 |
| Americas | 2007 | 73.60812 | 25 |
| Asia | 2007 | 70.72848 | 33 |
| Europe | 2007 | 77.64860 | 30 |
| Oceania | 2007 | 80.71950 | 2 |
17.2 bind_cols
bind_cols(.data, …) helps to unite two dataframes with the same rows count.
grouped_data2002pop <- gapminder %>%
filter(year == 2002) %>% # year
group_by(continent) %>% # grouping condition
summarise(totalPop = sum(pop)) %>%
mutate(year = 2002)
grouped_data2002pop| continent | totalPop | year |
|---|---|---|
| <fct> | <dbl> | <dbl> |
| Africa | 833723916 | 2002 |
| Americas | 849772762 | 2002 |
| Asia | 3601802203 | 2002 |
| Europe | 578223869 | 2002 |
| Oceania | 23454829 | 2002 |
Let’s combine d2002 and grouped_data2002pop:
grouped_data <- d2002 %>%
bind_cols(grouped_data2002pop)
grouped_data
# columns with the same name were renamed!New names:
* `continent` -> `continent...1`
* `year` -> `year...2`
* `continent` -> `continent...5`
* `year` -> `year...7`
| continent...1 | year...2 | lifeExpAvg | countriesCount | continent...5 | totalPop | year...7 |
|---|---|---|---|---|---|---|
| <fct> | <int> | <dbl> | <int> | <fct> | <dbl> | <dbl> |
| Africa | 2002 | 53.32523 | 52 | Africa | 833723916 | 2002 |
| Americas | 2002 | 72.42204 | 25 | Americas | 849772762 | 2002 |
| Asia | 2002 | 69.23388 | 33 | Asia | 3601802203 | 2002 |
| Europe | 2002 | 76.70060 | 30 | Europe | 578223869 | 2002 |
| Oceania | 2002 | 79.74000 | 2 | Oceania | 23454829 | 2002 |
You can remove same named variables before binding:
grouped_data <- d2002 %>%
bind_cols(grouped_data2002pop %>%
select(-continent, -year))
grouped_data
# better, but continents order is not the same in both frames
# your data is going to be damaged| continent | year | lifeExpAvg | countriesCount | totalPop |
|---|---|---|---|---|
| <fct> | <int> | <dbl> | <int> | <dbl> |
| Africa | 2002 | 53.32523 | 52 | 833723916 |
| Americas | 2002 | 72.42204 | 25 | 849772762 |
| Asia | 2002 | 69.23388 | 33 | 3601802203 |
| Europe | 2002 | 76.70060 | 30 | 578223869 |
| Oceania | 2002 | 79.74000 | 2 | 23454829 |
grouped_data2002pop <- grouped_data2002pop %>%
arrange(totalPop)
grouped_data <- d2002 %>%
bind_cols(grouped_data2002pop)
grouped_data
# you can see that continent fields different in the same rowNew names:
* `continent` -> `continent...1`
* `year` -> `year...2`
* `continent` -> `continent...5`
* `year` -> `year...7`
| continent...1 | year...2 | lifeExpAvg | countriesCount | continent...5 | totalPop | year...7 |
|---|---|---|---|---|---|---|
| <fct> | <int> | <dbl> | <int> | <fct> | <dbl> | <dbl> |
| Africa | 2002 | 53.32523 | 52 | Oceania | 23454829 | 2002 |
| Americas | 2002 | 72.42204 | 25 | Europe | 578223869 | 2002 |
| Asia | 2002 | 69.23388 | 33 | Africa | 833723916 | 2002 |
| Europe | 2002 | 76.70060 | 30 | Americas | 849772762 | 2002 |
| Oceania | 2002 | 79.74000 | 2 | Asia | 3601802203 | 2002 |
How to solve this? Join functions issolution.
17.3 Refences
- dplyr: A Grammar of Data Manipulation on https://cran.r-project.org/.
- Data Transformation with splyr::cheat sheet.
- DPLYR TUTORIAL : DATA MANIPULATION (50 EXAMPLES) by Deepanshu Bhalla.
- Dplyr Intro by Stat 545. 6.R Dplyr Tutorial: Data Manipulation(Join) & Cleaning(Spread). Introduction to Data Analysis
- Loan Default Prediction. Beginners data set for financial analytics Kaggle