Normalize columns of a dataframe

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Island

--

Chapters
00:00 Question
00:40 Accepted answer (Score 401)
01:07 Answer 2 (Score 748)
01:37 Answer 3 (Score 76)
04:18 Answer 4 (Score 76)
04:47 Thank you

--

Full question
https://stackoverflow.com/questions/2641...

Accepted answer links:
[documentation]: http://scikit-learn.org/stable/modules/p...

Answer 3 links:
[Wikipedia: Unbiased Estimation of Standard Deviation]: https://en.wikipedia.org/wiki/Unbiased_e...
[sklearn.preprocessing.scale]: https://scikit-learn.org/stable/modules/...

Answer 4 links:
https://stats.stackexchange.com/question...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe #normalize

#avk47

ANSWER 1

Score 825

one easy way by using Pandas: (here I want to use mean normalization)

normalized_df=(df-df.mean())/df.std()

to use min-max normalization:

normalized_df=(df-df.min())/(df.max()-df.min())

Edit: To address some concerns, need to say that Pandas automatically applies colomn-wise function in the code above.

ACCEPTED ANSWER

Score 421

You can use the package sklearn and its associated preprocessing utilities to normalize the data.

import pandas as pd
from sklearn import preprocessing

x = df.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled)

For more information look at the scikit-learn documentation on preprocessing data: scaling features to a range.

ANSWER 3

Score 80

Based on this post: https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range

You can do the following:

def normalize(df):
    result = df.copy()
    for feature_name in df.columns:
        max_value = df[feature_name].max()
        min_value = df[feature_name].min()
        result[feature_name] = (df[feature_name] - min_value) / (max_value - min_value)
    return result

You don't need to stay worrying about whether your values are negative or positive. And the values should be nicely spread out between 0 and 1.

ANSWER 4

Score 66

Your problem is actually a simple transform acting on the columns:

def f(s):
    return s/s.max()

frame.apply(f, axis=0)

Or even more terse:

   frame.apply(lambda x: x/x.max(), axis=0)