12  Subset rows with slice()

author: Юрій Клебан

Before start load packages

library(dplyr) # for demos
library(gapminder)  # load package and dataset


If .data is a grouped_df, the operation will be performed on each group, so that (e.g.) slice_head(df, n = 5) will select the first five rows in each group.


gapminder |> slice(1) # top 1 row
A tibble: 1 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
Afghanistan Asia 1952 28.801 8425333 779.4453
gapminder |> slice(1:6) # top n = 6
A tibble: 6 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
Afghanistan Asia 1952 28.801 8425333 779.4453
Afghanistan Asia 1957 30.332 9240934 820.8530
Afghanistan Asia 1962 31.997 10267083 853.1007
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1972 36.088 13079460 739.9811
Afghanistan Asia 1977 38.438 14880372 786.1134
gapminder |> slice_head(n = 6) # works like head()
A tibble: 6 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
Afghanistan Asia 1952 28.801 8425333 779.4453
Afghanistan Asia 1957 30.332 9240934 820.8530
Afghanistan Asia 1962 31.997 10267083 853.1007
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1972 36.088 13079460 739.9811
Afghanistan Asia 1977 38.438 14880372 786.1134
gapminder |> slice_tail(n = 5) # works like tail()
A tibble: 5 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
Zimbabwe Africa 1987 62.351 9216418 706.1573
Zimbabwe Africa 1992 60.377 10704340 693.4208
Zimbabwe Africa 1997 46.809 11404948 792.4500
Zimbabwe Africa 2002 39.989 11926563 672.0386
Zimbabwe Africa 2007 43.487 12311143 469.7093

You can drop some recods with negative indexes:

gapminder |> slice(-c(1:3,5)) |> # remove Afganistan years 1952, 1957, 1962 and 1972 
A tibble: 6 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1977 38.438 14880372 786.1134
Afghanistan Asia 1982 39.854 12881816 978.0114
Afghanistan Asia 1987 40.822 13867957 852.3959
Afghanistan Asia 1992 41.674 16317921 649.3414
Afghanistan Asia 1997 41.763 22227415 635.3414
# Random rows selection with slice_sample()
gapminder |> slice_sample(n = 5) #use set.seed() to fix random
A tibble: 5 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
Slovak Republic Europe 1987 71.080 5199318 12037.268
Chile Americas 1992 74.126 13572994 7596.126
Spain Europe 1952 64.940 28549870 3834.035
Sri Lanka Asia 1972 65.042 13016733 1213.396
Czech Republic Europe 2002 75.510 10256295 17596.210
# Rows with minimum and maximum values of a variable
# Lets find top 5 records with minimum and maximum lifeExp in all dataset
gapminder |> slice_min(lifeExp, n = 5)
gapminder |> slice_max(lifeExp, n = 5)
A tibble: 5 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
Rwanda Africa 1992 23.599 7290203 737.0686
Afghanistan Asia 1952 28.801 8425333 779.4453
Gambia Africa 1952 30.000 284320 485.2307
Angola Africa 1952 30.015 4232095 3520.6103
Sierra Leone Africa 1952 30.331 2143249 879.7877
A tibble: 5 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
Japan Asia 2007 82.603 127467972 31656.07
Hong Kong, China Asia 2007 82.208 6980412 39724.98
Japan Asia 2002 82.000 127065841 28604.59
Iceland Europe 2007 81.757 301931 36180.79
Switzerland Europe 2007 81.701 7554661 37506.42

12.1 Refences

  1. dplyr: A Grammar of Data Manipulation on https://cran.r-project.org/.
  2. Data Transformation with splyr::cheat sheet.
  4. Dplyr Intro by Stat 545. 6.R Dplyr Tutorial: Data Manipulation(Join) & Cleaning(Spread). Introduction to Data Analysis
  5. Loan Default Prediction. Beginners data set for financial analytics Kaggle