Get Size and Shape of the dataframe: In order to get the number of rows and number of column in pyspark we will be using functions like count() function and length() function. Dimension of the dataframe in pyspark is calculated by extracting the number of rows and number columns of the dataframe. We will also get the count of distinct rows in pyspark .Let’s see how to
- Get size and shape of the dataframe in pyspark
- Count the number of rows in pyspark with an example using count()
- Count the number of distinct rows in pyspark with an example
- Count the number of columns in pyspark with an example
We will be using dataframe named df_student
Get Size and Shape of the dataframe in pyspark:
size and shape of the dataframe is nothing but the number of rows and number of columns of the dataframe in pyspark.
########## Get Size and shape of the dataframe in pyspark print((df_student.count(), len(df_student.columns)))
Result:
Count the number of rows in pyspark – Get number of rows
Syntax:
df – dataframe
dataframe.count() function counts the number of rows of dataframe.
########## count number of rows df_student.count()
Result:
Count the number of distinct rows in pyspark – Get number of distinct rows:
Syntax:
df.distinct.count()
df – dataframe
dataframe.distinct.count() function counts the number of distinct rows of dataframe.
########## count number of distinct rows df_student.distinct().count()
Result:
Count the number of columns in pyspark – Get number of columns:
Syntax:
df – dataframe
len(df.columns) counts the number of columns of dataframe.
########## count number of columns len(df_student.columns)
Result:
Other Related Topics:
- Count of Missing (NaN,Na) and null values in Pyspark
- Mean, Variance and standard deviation of column in Pyspark
- Maximum or Minimum value of column in Pyspark
- Raised to power of column in pyspark – square, cube , square root and cube root in pyspark
- Drop column in pyspark – drop single & multiple columns
- Subset or Filter data with multiple conditions in pyspark
- Frequency table or cross table in pyspark – 2 way cross table
- Groupby functions in pyspark (Aggregate functions) – Groupby count, Groupby sum, Groupby mean, Groupby min and Groupby max
- Descriptive statistics or Summary Statistics of dataframe in pyspark
- Rearrange or reorder column in pyspark
- cumulative sum of column and group in pyspark
- Calculate Percentage and cumulative percentage of column in pyspark
- Get data type of column in Pyspark (single & Multiple columns)
- Get List of columns and its data type in Pyspark