Normalize columns of a dataframe
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Island
--
Chapters
00:00 Question
00:40 Accepted answer (Score 401)
01:07 Answer 2 (Score 748)
01:37 Answer 3 (Score 76)
04:18 Answer 4 (Score 76)
04:47 Thank you
--
Full question
https://stackoverflow.com/questions/2641...
Accepted answer links:
[documentation]: http://scikit-learn.org/stable/modules/p...
Answer 3 links:
[Wikipedia: Unbiased Estimation of Standard Deviation]: https://en.wikipedia.org/wiki/Unbiased_e...
[sklearn.preprocessing.scale]: https://scikit-learn.org/stable/modules/...
Answer 4 links:
https://stats.stackexchange.com/question...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe #normalize
#avk47
ANSWER 1
Score 825
one easy way by using Pandas: (here I want to use mean normalization)
normalized_df=(df-df.mean())/df.std()
to use min-max normalization:
normalized_df=(df-df.min())/(df.max()-df.min())
Edit: To address some concerns, need to say that Pandas automatically applies colomn-wise function in the code above.
ACCEPTED ANSWER
Score 421
You can use the package sklearn and its associated preprocessing utilities to normalize the data.
import pandas as pd
from sklearn import preprocessing
x = df.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled)
For more information look at the scikit-learn documentation on preprocessing data: scaling features to a range.
ANSWER 3
Score 80
Based on this post: https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range
You can do the following:
def normalize(df):
result = df.copy()
for feature_name in df.columns:
max_value = df[feature_name].max()
min_value = df[feature_name].min()
result[feature_name] = (df[feature_name] - min_value) / (max_value - min_value)
return result
You don't need to stay worrying about whether your values are negative or positive. And the values should be nicely spread out between 0 and 1.
ANSWER 4
Score 66
Your problem is actually a simple transform acting on the columns:
def f(s):
return s/s.max()
frame.apply(f, axis=0)
Or even more terse:
frame.apply(lambda x: x/x.max(), axis=0)