In order to concatenate two columns in pyspark we will be using concat() Function. We look at an example on how to join or concatenate two string columns in pyspark (two or more columns) and also string and numeric column with space or any separator.
- Concatenate two columns in pyspark without space
- Concatenate columns in pyspark with single space
- Concatenate columns with hyphen in pyspark (“-”)
- Concatenate by removing leading and trailing space
- Concatenate numeric and character column in pyspark
we will be using “df_states” dataframe
Concatenate two columns in pyspark with single space :Method 1
Concatenating two columns in pyspark is accomplished using concat() Function.
###### concatenate using single space from pyspark.sql.functions import concat, lit, col df1=df_states.select("*", concat(col("state_name"),lit(" "),col("state_code")).alias("state_name_code")) df1.show()
so the dataframe with concatenated column of single space will be
Concatenate columns in pyspark with single space :Method 2
Concatenating columns in pyspark is accomplished using concat() Function.
###### concatenate using single space from pyspark.sql import functions as sf df=df_states.withColumn('joined_column', sf.concat(sf.col('state_name'),sf.lit(' '),sf.col('state_code'))) df.show()
So the dataframe with concatenated column of single space will be
Concatenate two columns in pyspark without space :Method 1
Concatenating two columns in pyspark is accomplished using concat() Function.
###### concatenate two columns without space from pyspark.sql.functions import concat, lit, col df1=df_states.select("*", concat(col("state_name"),col("state_code")).alias("state_name_code")) df1.show()
So the dataframe with concatenated column without space will be
Concatenate two columns without space :Method 2
Concatenating two columns is accomplished using concat() Function.
###### concatenate two columns without space from pyspark.sql import functions as sf df=df_states.withColumn('joined_column', sf.concat(sf.col('state_name'),sf.col('state_code'))) df.show()
So the dataframe with concatenated column without space will be
Concatenate two columns with hyphen :Method 1
Concatenating multiple columns is accomplished using concat() Function.
###### concatenate using hyphen from pyspark.sql.functions import concat, lit, col df1=df_states.select("*", concat(col("state_name"),lit("-"),col("state_code")).alias("state_name_code")) df1.show()
so the dataframe with concatenated column with hyphen (“-”) will be
Concatenate two columns with hyphen :Method 2
Concatenating columns in pyspark is accomplished using concat() Function.
###### concatenate using hyphen from pyspark.sql import functions as sf df=df_states.withColumn('joined_column', sf.concat(sf.col('state_name'),sf.lit('-'),sf.col('state_code'))) df.show()
so the dataframe with concatenated column with hyphen (“-”) will be
Concatenate numeric and character column in pyspark:
Concatenating numeric and character column in pyspark is accomplished by converting the numeric column to character by using cast() function. After that both the columns are concatenated using concat() function.
###### concatenate using hyphen from pyspark.sql import functions as sf df=df_states.withColumn('joined_column', sf.concat(sf.col('state_name'),sf.lit('-'),sf.col('hindex_score'))) df.show()
So the dataframe with numeric and character column concatenated will be
Concatenate two columns in pyspark by removing leading and trailing space :
First Concatenation of two columns in pyspark is accomplished using concat() Function. After that trim() function is used to remove leading and trailing space
##### concatenate two columns by removing leading and trailing space from pyspark.sql.functions import concat, lit, col from pyspark.sql.functions import * df1=df_states.select("*", concat(col("state_name"),lit("-"),col("state_code")).alias("state_name_code")) df1 = df1.withColumn('state_name_code', trim(df1.state_name_code)) df1.show()
So the dataframe with concatenated column with leading and trailing space removed will be
Other related topics:
- Remove leading zero of column in pyspark
- Left and Right pad of column in pyspark –lpad() & rpad()
- Add Leading and Trailing space of column in pyspark – add space
- Remove Leading, Trailing and all space of column in pyspark – strip & trim space
- String split of the columns in pyspark
- Repeat the column in Pyspark
- Get Substring of the column in Pyspark
- Get String length of column in Pyspark
- Typecast string to date and date to string in Pyspark
- Typecast Integer to string and String to integer in Pyspark
- Extract First N and Last N character in pyspark
- Convert to upper case, lower case and title case in pyspark
- Add leading zeros to the column in pyspark