Normalize columns of a dataframe
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Book End
--
Chapters
00:00 Normalize Columns Of A Dataframe
00:26 Accepted Answer Score 421
00:49 Answer 2 Score 66
01:04 Answer 3 Score 80
01:28 Answer 4 Score 825
01:49 Thank you
--
Full question
https://stackoverflow.com/questions/2641...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe #normalize
#avk47
ANSWER 1
Score 825
one easy way by using Pandas: (here I want to use mean normalization)
normalized_df=(df-df.mean())/df.std()
to use min-max normalization:
normalized_df=(df-df.min())/(df.max()-df.min())
Edit: To address some concerns, need to say that Pandas automatically applies colomn-wise function in the code above.
ACCEPTED ANSWER
Score 421
You can use the package sklearn and its associated preprocessing utilities to normalize the data.
import pandas as pd
from sklearn import preprocessing
x = df.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled)
For more information look at the scikit-learn documentation on preprocessing data: scaling features to a range.
ANSWER 3
Score 80
Based on this post: https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range
You can do the following:
def normalize(df):
    result = df.copy()
    for feature_name in df.columns:
        max_value = df[feature_name].max()
        min_value = df[feature_name].min()
        result[feature_name] = (df[feature_name] - min_value) / (max_value - min_value)
    return result
You don't need to stay worrying about whether your values are negative or positive. And the values should be nicely spread out between 0 and 1.
ANSWER 4
Score 66
Your problem is actually a simple transform acting on the columns:
def f(s):
    return s/s.max()
frame.apply(f, axis=0)
Or even more terse:
   frame.apply(lambda x: x/x.max(), axis=0)