Scaling and normalizing a column in pandas python is required, to standardize the data, before we model a data. We will be using preprocessing method from scikitlearn package. Lets see an example which normalizes the column in pandas by scaling
Create a single column dataframe:
import pandas as pd import numpy as np from sklearn import preprocessing # Create a DataFrame d = { 'Score':[62,-47,-55,74,31,77,85,63,42,67,89,81,56]} df = pd.DataFrame(d,columns=['Score']) print df
So the resultant dataframe will be
On plotting the score it will be
Step 1: convert the column of a dataframe to float
# 1.convert the column value of the dataframe as floats float_array = df['Score'].values.astype(float)
Step 2: create a min max processing object. Pass the float column to the min_max_scaler() which scales the dataframe by processing it as shown below
# 2. create a min max processing object min_max_scaler = preprocessing.MinMaxScaler() scaled_array = min_max_scaler.fit_transform(float_array)
Step 3: Convert the scaled array to the dataframe.
# 3. convert the scaled array to dataframe df_normalized = pd.DataFrame(scaled_array) df_normalized
so the final normalized dataframe will be
On plotting the scaled score the graph will be