To extract the substring of the column in R we use functions like substr() , str_sub() or str_extract() function. Let’s see how to get the substring of the column in R using regular expression. Given below are some of the examples discussed on getting the substring of the column in R.
- Extract first n characters in R
- Extract last n characters in R
- Extract First word of the column in R
- Extract last word of the column in R
- Extract substring of the column using regular expression in R.
With an example of each. Let’s first create the dataframe
df1 = data.frame(State = c('Arizona AZ','Georgia GG', 'Newyork NY','Indiana IN','Florida FL'), Score=c(62,47,55,74,31)) df1
So the resultant dataframe will be
Extract first n characters of the column in R
Method 1:
In the below example we have used substr() function to find first n characters of the column in R. substr() function takes column name, starting position and length of the strings as argument, which will return the substring of the specific column as shown below.
## Method 1 - extract first n character df1$substring_State = substr(df1$State,1,4) df1
so the dataframe will be
Method 2:
In the below example we have used str_sub() function to find first n characters of the column in R. str_sub() function takes column name, starting position and length of the strings as argument, which will return the substring of the specific column as shown below.
## Method 2 - extract first n character library(stringr) df1$substring_State = str_sub(df1$State,1,4) df1
so the dataframe will be
Extract last n characters of the column in R:
In below example we have used str_sub() function to find last n characters of the column in R. str_sub() function takes column name, number of characters from last with minus symbol.
# extract last 2 string of column df1$last_2_string = str_sub(df1$State,-2) df1
So the dataframe is
Extract First word of the column in R:
Extract first word of the column with str_extract() function along with regular expression is shown below
# extract first word of the column in R df1$substring_first <- str_extract(df1$State,"(\\w+)") df1
So the resultant dataframe is
Extract Last word of the column in R
Extract last word of the column with str_extract() function along with regular expression is shown below
# extract last word of the column in R library(stringr) df1$substring_last <- str_extract(df1$State,"\\w+$") df1
So the resultant dataframe is