Scaling and normalizing a column in Pandas python

Scaling and normalizing a column in pandas python is required, to standardize the data, before we model a data. We will be using preprocessing method from scikitlearn package. Lets see an example which normalizes the column in pandas by scaling

Create a single column dataframe:

import pandas as pd
import numpy as np
from sklearn import preprocessing

# Create a DataFrame
d = {
       'Score':[62,-47,-55,74,31,77,85,63,42,67,89,81,56]}

df = pd.DataFrame(d,columns=['Score'])
print df

So the resultant dataframe will be

scaling-and-normalizing-a-column-in-pandas-dataframe-python-1

On plotting the score it will be

scaling-and-normalizing-the-column-pandas-actual-score

Step 1: convert the column of a dataframe to float


# 1.convert the column value of the dataframe as floats

float_array = df['Score'].values.astype(float)

Step 2: create a min max processing object. Pass the float column to the min_max_scaler() which scales the dataframe by processing it as shown below


# 2. create a min max processing object

min_max_scaler = preprocessing.MinMaxScaler()
scaled_array = min_max_scaler.fit_transform(float_array)

Step 3: Convert the scaled array to the dataframe.

# 3. convert the scaled array to dataframe

df_normalized = pd.DataFrame(scaled_array)
df_normalized

so the final normalized dataframe will be

scaling-and-normalizing-a-column-in-pandas-dataframe-python-2

On plotting the scaled score the graph will be

scaling-and-normalizing-the-column-pandas-scaled-score

Author

Sridhar Venkatachalam

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.
View all posts

Scaling and normalizing a column in Pandas python

Create a single column dataframe:

Author

Related Posts:

.