16 Grouping columns with dplyr
author: Юрій Клебан
Before start load packages
16.1 group_by()
+ summarise()
group_by(.data, ..., add = FALSE)
returns copy of table grouped by defined columns.
Let’s find average by lifeExp
for each continent
in 2002
(ouput is continent
, lifeExpAvg2002
, countriesCount
, year = 2002
):
gapminder |>
filter(year == 2002) |> # year
group_by(continent) |> # grouping condition, you ca
summarise(
lifeExpAvg2002 = mean(lifeExp),
countriesCount = n() # n() count of rows in group
)
continent | lifeExpAvg2002 | countriesCount |
---|---|---|
<fct> | <dbl> | <int> |
Africa | 53.32523 | 52 |
Americas | 72.42204 | 25 |
Asia | 69.23388 | 33 |
Europe | 76.70060 | 30 |
Oceania | 79.74000 | 2 |
Let’s find total population
for each continent
in 2002
(ouput is continent
, totalPop
, year
):
gapminder |>
filter(year == 2002) |> # year
group_by(continent, year) |> # grouping condition
summarise(totalPop = sum(pop), .groups = "keep")
continent | year | totalPop |
---|---|---|
<fct> | <int> | <dbl> |
Africa | 2002 | 833723916 |
Americas | 2002 | 849772762 |
Asia | 2002 | 3601802203 |
Europe | 2002 | 578223869 |
Oceania | 2002 | 23454829 |
There are additional variations of summarise()
:
16.1.1 Task on Credits (rewrite it)
library(ISLR)
group_inc <- aggregate(Income ~ Age + Gender, data = Credit, mean)
m_data <- group_inc[group_inc$Gender == " Male", ]
nrow(m_data)
f_data <- group_inc[group_inc$Gender == "Female", ]
nrow(f_data)
with(m_data, plot(Age, Income, type = "l", col="red"))
with(f_data, lines(Age, Income, type = "l", col ="blue"))
63
62
cd <- Credit %>%
select(Income, Age, Gender) %>%
group_by(Age, Gender) %>%
summarize(Income = mean(Income))
m_data <- cd %>% filter(Gender == " Male")
nrow(m_data)
f_data <- cd %>% filter(Gender == "Female")
nrow(f_data)
with(m_data, plot(Age, Income, type = "l", col="red"))
with(f_data, lines(Age, Income, type = "l", col ="blue"))
`summarise()` has grouped output by 'Age'. You can override using the `.groups`
argument.
63
62
16.2 Refences
- dplyr: A Grammar of Data Manipulation on https://cran.r-project.org/.
- Data Transformation with splyr::cheat sheet.
- DPLYR TUTORIAL : DATA MANIPULATION (50 EXAMPLES) by Deepanshu Bhalla.
- Dplyr Intro by Stat 545. 6.R Dplyr Tutorial: Data Manipulation(Join) & Cleaning(Spread). Introduction to Data Analysis
- Loan Default Prediction. Beginners data set for financial analytics Kaggle