Drop duplicate rows in pandas python drop_duplicates()

In this section we will learn how to delete or drop duplicate rows of a dataframe in python pandas with example using drop_duplicates() function. lets learn how to

  • Delete or Drop duplicate rows in pandas python using drop_duplicate() function
  • Drop the duplicate rows in pandas by retaining last occurrence
  • Delete or Drop duplicate in pandas by a specific column name
  • Delete All Duplicate Rows from DataFrame
  • Drop duplicate rows in pandas by  inplace = “True”

Drop duplicates in pandas python 1

Pandas drop_duplicates() Function Syntax

drop_duplicates(self, subset=None, keep="first", inplace=False)
  • subset: Subset takes a column or list of column label for identifying duplicate rows. By default, all the columns are used to find the duplicate rows.
  • keep: allowed values are {‘first’, ‘last’, False}, default ‘first’. If ‘first’, duplicate rows except the first one is deleted. If ‘last’, duplicate rows except the last one is deleted. If False, all the duplicate rows are deleted.
  • inplace: if True, the source DataFrame itself is changed. By default, source DataFrame remains unchanged and a new DataFrame instance is returned.

 

Create dataframe:

1
2
3
4
5
6
7
8
9
10
11
12
13
import pandas as pd
import numpy as np
 
#Create a DataFrame
 
import pandas as pd
import numpy as np
 
d = { 'Name':['Alisa','raghu','jodha','jodha','raghu','Cathrine', 'Alisa','Bobby','Bobby','Alisa','raghu','Cathrine'],
     'Age':[26,23,23,23,23,24,26,24,22,26,23,24],
     'Score':[85,31,55,55,31,77,85,63,42,85,31,np.nan]}
df = pd.DataFrame(d,columns=['Name','Age','Score'])
df

so the resultant dataframe will be

Drop duplicates in pandas python 2

 

 

1. Drop the duplicate rows in pandas: by default it keeps the first occurrence of duplicate

Now lets simply drop the duplicate rows in pandas using drop_duplicates() function as shown below

1
2
3
# drop duplicate rows
 
df.drop_duplicates()

In the above example first occurrence of the duplicate row is kept and subsequent occurrence will be deleted, so the output will be

Drop duplicates in pandas python 3

 

 

2. Drop duplicate rows by retaining last occurrence in pandas python:

1
2
3
# drop duplicate rows
 
df.drop_duplicates(keep='last')

In the above example keep=’last’ argument . Keeps the last duplicate row and delete the rest duplicated rows. So the output will be

Drop duplicates in pandas python 4

 

3. Delete All Duplicate Rows from DataFrame in pandas

1
2
3
4
#### Drop all duplicates
 
result_df = df.drop_duplicates(keep=False)
result_df

In the above example keep=False argument . Keeps only the non duplicated rows. So the output will be

Drop duplicates in pandas python 5

 

 

4.  Drop the duplicates  by a specific column in pandas: Method 1

Now let’s drop duplicate by column name. Rows are dropped in such a way that unique column value is retained for that column as shown below

1
2
3
# drop duplicate by a column name
 
df.drop_duplicates(['Name'], keep='last')

In the above example rows are deleted in such a way that, Name column contains only unique values

So the result will be

Drop duplicates in pandas python 6

 

 

 

4.  Drop the duplicates  by a specific columns: Method 2

Now let’s drop duplicates by multiple column names. Rows are dropped in such a way that unique column value is retained for those mentioned columns as shown below

1
2
3
# drop duplicate by multiple column name
 
df.drop_duplicates(subset=['Age', 'Score'])

In the above example duplicate rows are dropped based on Age and Score Column,  combination of these two column has unique vlaues

So the result will be

drop duplicate in pandas python remove duplicate rows in 6

 

 

5. Drop duplicate rows in pandas python by inplace = “True”

Now lets simply drop the duplicate rows in pandas source table itself as shown below

1
2
3
# drop duplicate rows
 
df.drop_duplicates(inplace=True)

In the above example first occurrence of the duplicate row is kept and subsequent occurrence will be deleted and inplace = True replaces the source table itself, so the output will be

Drop duplicates in pandas python 3


Other related topics :

for documentation on drop_duplicates() function kindly refer here

previous-small Delete or Drop the duplicate row of a dataframe in python pandas                                                                                                           next_small Delete or Drop the duplicate row of a dataframe in python pandas

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts
Loading...