In order to Filter or subset rows in R we will be using Dplyr package. Dplyr package in R is provided with filter() function which subsets the rows with multiple conditions on different criteria.
We will be using mtcars data to depict the example of filtering or subsetting.
- Filter or subset the rows in R using dplyr.
- Subset or Filter rows in R with multiple condition
- Filter rows based on AND condition OR condition in R
- Filter rows using slice family of functions for a matrix or data frame in R
- slice_sample() function in R returns the sample n rows of the dataframe in R
- slice_head() and slice_tail() function in R returns first n and last n rows in R
- subset and group by rows in R
- subset using top_n() function in R
- select or subset a sample using sample_n() and sample_frac() function in R
Filter or subset the rows in R using Dplyr:
Subset using filter() function.
library(dplyr) mydata <- mtcars # subset the rows of dataframe with condition Mydata1 = filter(mydata,cyl==6) Mydata1
Only the rows with cyl =6 is filtered
Filter or subset the rows in R with multiple conditions using Dplyr:
library(dplyr) mydata <- mtcars # subset the rows of dataframe with multiple conditions Mydata1 = filter(mydata, gear %in% c(4,5)) Mydata1
The rows with gear=4 or 5 are filtered
Filter or subsetting the rows in R with multiple conditions (AND) using Dplyr:
library(dplyr) mydata <- mtcars # subset the rows of dataframe with multiple conditions Mydata1 = filter(mydata, gear %in% c(4,5) & carb==2) Mydata1
The rows with gear= (4 or 5) and carb=2 are filtered
Filter or subsetting the rows in R with multiple conditions (OR) using Dplyr:
library(dplyr) mydata <- mtcars # subset the rows of dataframe with multiple conditions Mydata1 = filter(mydata, gear %in% c(4,5) | mpg==21.0) Mydata1
The rows with gear= (4 or 5) or mpg=21 are filtered
Filter or subsetting the rows in R with multiple conditions (NOT) using Dplyr:
library(dplyr) mydata <- mtcars # subset the rows of dataframe with multiple conditions Mydata1 = filter(mydata, !gear %in% c(4,5)) Mydata1
The rows with gear!=4 or gear!=5 are filtered
Filter or subsetting the rows in R with Contains condition using Dplyr:
library(dplyr) mydata <- mtcars # subset the rows of dataframe with multiple conditions Mydata1 = filter(mydata, grepl(0,hp)) Mydata1
hp which contains value 0 are filtered
Subset using Slice Family of function in R dplyr :
slice_head() function in R:
slice_head() function returns the top n rows of the dataframe as shown below.
# slice_head() function in R library(dplyr) mtcars %>% slice_head(n = 5)
so the top 5 rows are returned
slice_tail() function in R:
slice_tail() function returns the bottom n rows of the dataframe as shown below.
# slice_tail() function in R library(dplyr) mtcars %>% slice_tail(n = 5)
so the sample 5 rows are returned
slice_max() function in R:
slice_max() function returns the maximum n rows of the dataframe based on a column as shown below.
# slice_max() function in R library(dplyr) mtcars %>% slice_max(mpg, n = 5)
so the max 5 rows based on mpg column will be returned
slice_min() function in R:
slice_min() function returns the minimum n rows of the dataframe based on a column as shown below.
# slice_min() function in R library(dplyr) mtcars %>% slice_min(mpg, n = 5)
so the min 5 rows based on mpg column will be returned
slice_sample() function in R:
slice_sample() function returns the sample n rows of the dataframe as shown below.
# slice_sample() function in R library(dplyr) mtcars %>% slice_sample(n = 5)
so the sample 5 rows are returned
Slice by Group in R:
slice_head() by group in R: returns the top n rows of the group using slice_head() and group_by() functions
# slice_head() by group in R mtcars %>% group_by(vs) %>% slice_head(n = 2)
slice_tail() by group in R:
slice_tail() by group in R returns the bottom n rows of the group using slice_tail() and group_by() functions
# slice_tail() by group in R mtcars %>% group_by(vs) %>% slice_tail(n = 2)
slice_sample() by group in R
slice_sample() by group in R Returns the sample n rows of the group using slice_sample() and group_by() functions
# slice_sample() by group in R mtcars %>% group_by(vs) %>% slice_sample(n = 2)
Using top_n() function in R:
Top n rows of the dataframe with respect to a column is achieved by using top_n() functions
# top_n() function in R mtcars %>% top_n(10)
so the resultant dataframe will be
for more details refer here
Subset and select Sample in R :
sample_n() Function in Dplyr
The sample_n function selects random rows from a data frame (or table). First parameter contains the data frame name, the second parameter of the function tells R the number of rows to select.
library(dplyr) mydata = mtcars # select random 4 rows of the dataframe sample_n(mydata,4)
In the above code sample_n() function selects random 4 rows of the mtcars dataset. so the result will be
sample_frac() Function in Dplyr :
The sample_frac() function selects random n percentage of rows from a data frame (or table). First parameter contains the data frame name, the second parameter tells what percentage of rows to select
library(dplyr) mydata = mtcars # select random 20 percentage rows of the dataframe sample_frac(mydata,0.2)
In the above code sample_frac() function selects random 20 percentage of rows from mtcars dataset. So the result will be