Select variables (column) in R using Dplyr – select () Function

Select function in R is used to select variables (columns) in R using Dplyr package. Dplyr package in R is provided with select() function which select the columns based on conditions. select() function in dplyr which is used to select the columns based on conditions like starts with, ends with, contains and matches certain criteria and also selecting column based on position, Regular expression, criteria like selecting column names without missing values has been depicted with an example for each.

Select column with column name in R dplyr.
Select column by column position in dplyr
Select column which contains a value or matches a pattern.
Select column which starts with or ends with certain character.
Select column name with Regular Expression using grepl() function
Select column name with missing values

We will be using mtcars data to depict the select() function

Select column by column name and Position

Select function in R dplyr select() 22

select () Function in Dplyr: Select Column by Name

select() function helps us to select the column by passing the dataframe and column names of the dataframe as argument

library(dplyr)
mydata <- mtcars

# Select columns of the dataframe
select(mydata,mpg,cyl,wt)

The above code selects mpg, cyl and wt column

Select variables (columns) in R using Dplyr Select Function in R dplyr 1

Select Column by Position :

Select function in R dplyr select() 23

Select 3rd and 4th columns of the dataframe:

select() function also helps us to select the column by position, select() function takes dataframe and column position as argument

library(dplyr)
mydata <- mtcars

# Select 3rd and 4th columns of the dataframe
select(mydata,3:4)

the above code selects (3rd) disp and (4th) hp column

Select variables (columns) in R using Dplyr Select Function in R dplyr 2

Select Column with conditions and pattern matching in R dplyr

Select function in R dplyr select() 26

starts_with() function:

Select the column name which starts with mpg

library(dplyr)
mydata <- mtcars

# Select on columns names of the dataframe which starts with
select(mydata,starts_with("mpg"))

Select variables (columns) in R using Dplyr Select Function in R dplyr 4

Select the column names which does not starts with mpg

library(dplyr)
mydata <- mtcars

# deselect on columns names of the dataframe which starts with
select(mydata,-starts_with("mpg"))

Select variables (columns) in R using Dplyr Select Function in R dplyr 6

ends_with() function:

Select function in R dplyr select() 25

Select the column name which ends with cyl

library(dplyr)
mydata <- mtcars

# Select on columns names of the dataframe which ends with
select(mydata,ends_with("cyl"))

Select variables (columns) in R using Dplyr Select Function in R dplyr 5

contains() function:

Select the column name which contains “s”

library(dplyr)
mydata <- mtcars

# Select on columns names of the dataframe which contains
select(mydata,contains("s"))

Select variables (columns) in R using Dplyr Select Function in R dplyr 7

matches() function:

Select the column name which matches with “di”

library(dplyr)
mydata <- mtcars

# Select on columns names of the dataframe which matches
select(mydata,matches("di"))

Select variables (columns) in R using Dplyr Select Function in R dplyr 8

everything() function:

select everything /all columns of the dataframe

library(dplyr)
mydata <- mtcars

# Select everything
select(mydata,everything())

Select variables (columns) in R using Dplyr Select Function in R dplyr 9

select Column names using Regular Expression:

select the column name which matches with certain pattern using regular expression has been accomplished with the help of grepl() function. grepl() function pass the column name and regular expression as argument and returns the matched column as shown below.

mydata = mtcars

# select the column names using Regular Expression
mydata1 = mydata[,grepl("^c",names(mydata))]
mydata1

Selecting the column name which starts with “c” is accomplished using grepl() function along with regular expression.

Select columns without missing values:

Select function in R dplyr select() 24

In order depict an example on selecting a column without missing values, First lets create the dataframe as shown below.

my_basket = data.frame(ITEM_GROUP = c("Fruit","Fruit","Fruit","Fruit","Fruit","Vegetable","Vegetable","Vegetable","Vegetable","Dairy","Dairy","Dairy","Dairy","Dairy"), 
                       ITEM_NAME = c("Apple","Banana","Orange","Mango","Papaya","Carrot","Potato","Brinjal","Raddish","Milk","Curd","Cheese","Milk","Paneer"),
                       Price = c(100,80,80,90,65,70,60,70,25,60,40,35,50,60),
                       Tax = c(2,4,5,NA,2,3,NA,1,NA,4,5,NA,4,NA))
my_basket

so the dataframe will be

Drop column in R with dplyr 5

sapply function is an alternative of for loop. which built-in or user-defined function on each column of data frame. sapply(df, function(x) mean(is.na(x))) returns percentage of missing values in each column of a dataframe.

###### select columns without missing value

my_basket = my_basket[,!sapply(my_basket, function(x) mean(is.na(x)))> 0.3]
my_basket

The above program removed column “Tax” as it contains more than 30% missing values as we have given our threshold as 30%. so the final output dataframe will be without Tax column. Thereby selecting all the columns without missing values

Drop column in R with dplyr 5

for further understanding of selecting a column with dplyr package one can refer documentation

Author

Sridhar Venkatachalam

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.
View all posts