str.slice function extracts the substring of the column in pandas dataframe python. Let’s see an Example of how to get a substring from column of pandas dataframe and store it in new column. Extracting the substring of the column in pandas python can be done by using extract function with regular expression in it.
- Extract substring from the column in pandas python
- Fetch substring from start (left) of the column in pandas
- Get substring from end (right) of the column in pandas
- Get substring of the column using regular expression in pandas python
Substring of column in pandas python:
Substring of column in pandas data frames achieved by using str.slice function. Let’s see with an example. First let’s create a data frame
import pandas as pd import numpy as np #Create a DataFrame df1 = { 'State':['Arizona AZ','Georgia GG','Newyork NY','Indiana IN','Florida FL'], 'Score':[62,47,55,74,31]} df1 = pd.DataFrame(df1,columns=['State','Score']) print(df1)
df1 will be:
We will be using str.slice function on the column to get the substring. Here we will be taking first 7 letters as the substring on State column and will be naming the column as state_substring as shown below
''' Get the substring in pandas ''' df1['state_substring'] =df1.State.str.slice(0, 7) print(df1)
so the resultant dataframe contains first 7 letters of the “state” column are stored in separate column
Extract substring of the column in pandas using regular Expression:
We have extracted the last word of the state column using regular expression and stored in other column.
df1['State_code'] = df1.State.str.extract(r'\b(\w+)$', expand=True) print(df1)
so the resultant dataframe will be
Extract substring from right (end) of the column in pandas:
str[-n:] is used to get last n character of column in pandas
df1['Stateright'] = df1['State'].str[-2:] print(df1)
str[-2:] is used to get last two character from right of column in pandas and it is stored in another column namely Stateright so the resultant dataframe will be
Extract substring from start (left) of column in pandas:
str[:n] is used to get first n characters of column in pandas
df1['StateInitial'] = df1['State'].str[:2] print(df1)
str[:2] is used to get first two characters from left of column in pandas and it is stored in another column namely StateInitial so the resultant dataframe will be