Setdiff() Function in R using Dplyr (get difference of dataframes)

To get the difference of two data frames  i.e. To get the row present in one table which is not in other table we will be using setdiff() function in R ‘s Dplyr package . Dplyr package in R is provided with setdiff() function which gets the difference of two dataframe.

setdiff() function in R:

setdiff function in R takes the rows that appear in one tables but not in other


setdiff Function in R example: First lets create two data frames

#Create two data frames

df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Oven", 3), rep("Television", 3)))
df2 = data.frame(CustomerId = c(4:7), Product = c(rep("Television", 2), rep("Air conditioner", 2)))

df1 will be

    CustomerId  Product
1        1                 Oven
2        2                 Oven
3        3                 Oven
4        4                 Television
5        5                 Television
6        6                 Television

df2 will be 

CustomerId         Product

1          4           Television
2          5           Television
3          6          Air conditioner
4          7          Air conditioner

Setdiff function  in R takes the rows that appear in first table but not in second table and creates the dataframe.


#  difference of two dataframes  

The resultant dataframe will be

set difference function in r

Setdiff() Function in R using Dplyr (get difference of dataframes)                                                                                                          Setdiff() Function in R using Dplyr (get difference of dataframes)


  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts