Concatenate two columns in pyspark

In order to concatenate two columns in pyspark we will be using concat() Function. We look at an example on how to join or concatenate two string columns in pyspark (two or more columns) and also string and numeric column with space or any separator.

  • Concatenate two columns in pyspark without space
  • Concatenate columns in pyspark with single space
  • Concatenate columns with hyphen in pyspark (“-”)
  • Concatenate by removing leading and trailing space
  • Concatenate numeric and character column in pyspark

we will be using “df_states” dataframe

 

Concatenate two columns in pyspark 1

 

 

Concatenate two columns in pyspark with single space :Method 1

Concatenating two columns in pyspark is accomplished using concat() Function.

###### concatenate using single space

from pyspark.sql.functions import concat, lit, col

df1=df_states.select("*", concat(col("state_name"),lit(" "),col("state_code")).alias("state_name_code"))
df1.show()

so the dataframe with concatenated column of single space will be

Concatenate two columns in pyspark 2

 

Concatenate columns in pyspark with single space :Method 2

Concatenating columns in pyspark is accomplished using concat() Function.

###### concatenate using single space

from pyspark.sql import functions as sf

df=df_states.withColumn('joined_column', sf.concat(sf.col('state_name'),sf.lit(' '),sf.col('state_code')))
df.show()

So the dataframe with concatenated column of single space will be

 

 

 

 

Concatenate two columns in pyspark without space :Method 1

Concatenating two columns in pyspark is accomplished using concat() Function.

###### concatenate two columns without space

from pyspark.sql.functions import concat, lit, col

df1=df_states.select("*", concat(col("state_name"),col("state_code")).alias("state_name_code"))
df1.show()

So the dataframe with concatenated column without space will be

Concatenate two columns in pyspark 4

Concatenate two columns without space :Method 2

Concatenating two columns is accomplished using concat() Function.

###### concatenate two columns without space

from pyspark.sql import functions as sf

df=df_states.withColumn('joined_column', sf.concat(sf.col('state_name'),sf.col('state_code')))
df.show()

So the dataframe with concatenated column without space will be

Concatenate two columns in pyspark 5

 

 

 

Concatenate two columns with hyphen :Method 1

Concatenating multiple columns is accomplished using concat() Function.

###### concatenate using hyphen

from pyspark.sql.functions import concat, lit, col

df1=df_states.select("*", concat(col("state_name"),lit("-"),col("state_code")).alias("state_name_code"))
df1.show()

so the dataframe with concatenated column with hyphen (“-”) will be

Concatenate two columns in pyspark 6

Concatenate two columns with hyphen :Method 2

Concatenating columns in pyspark is accomplished using concat() Function.

###### concatenate using hyphen

from pyspark.sql import functions as sf

df=df_states.withColumn('joined_column', sf.concat(sf.col('state_name'),sf.lit('-'),sf.col('state_code')))
df.show()

so the dataframe with concatenated column with hyphen (“-”) will be

Concatenate two columns in pyspark 7

 

 

Concatenate numeric and character column in pyspark:

Concatenating numeric and character column in pyspark is accomplished by converting the numeric column to character by using cast() function. After that both the columns are concatenated using concat() function.

###### concatenate using hyphen

from pyspark.sql import functions as sf

df=df_states.withColumn('joined_column', sf.concat(sf.col('state_name'),sf.lit('-'),sf.col('hindex_score')))
df.show()

So the dataframe with numeric and character column concatenated will be

Concatenate two columns in pyspark 7a

 

Concatenate two columns in pyspark by removing leading and trailing space :

First Concatenation of two columns in pyspark is accomplished using concat() Function. After that  trim() function is used to remove leading and trailing space


##### concatenate two columns by removing leading and trailing space

from pyspark.sql.functions import concat, lit, col
from pyspark.sql.functions import *


df1=df_states.select("*", concat(col("state_name"),lit("-"),col("state_code")).alias("state_name_code"))
df1 = df1.withColumn('state_name_code', trim(df1.state_name_code))

df1.show()

So the dataframe with concatenated column with leading and trailing space removed will be

Concatenate two columns in pyspark 9

 

 


Other related topics:

 

Concatenate two columns in pyspark                                                                                                       Concatenate two columns in pyspark

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts