Dropping infinite values from dataframes in pandas?
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Future Grid Looping
--
Chapters
00:00 Question
00:29 Accepted answer (Score 659)
01:23 Answer 2 (Score 79)
02:00 Answer 3 (Score 29)
02:40 Answer 4 (Score 18)
03:19 Thank you
--
Full question
https://stackoverflow.com/questions/1747...
Accepted answer links:
[replace()]: http://pandas.pydata.org/pandas-docs/sta...
[dropna()]: http://pandas.pydata.org/pandas-docs/sta...
Answer 3 links:
[DougR's answer]: https://stackoverflow.com/a/54669790/712...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #numpy
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Future Grid Looping
--
Chapters
00:00 Question
00:29 Accepted answer (Score 659)
01:23 Answer 2 (Score 79)
02:00 Answer 3 (Score 29)
02:40 Answer 4 (Score 18)
03:19 Thank you
--
Full question
https://stackoverflow.com/questions/1747...
Accepted answer links:
[replace()]: http://pandas.pydata.org/pandas-docs/sta...
[dropna()]: http://pandas.pydata.org/pandas-docs/sta...
Answer 3 links:
[DougR's answer]: https://stackoverflow.com/a/54669790/712...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #numpy
#avk47
ACCEPTED ANSWER
Score 703
First replace() infs with NaN:
df.replace([np.inf, -np.inf], np.nan, inplace=True)
and then drop NaNs via dropna():
df.dropna(subset=["col1", "col2"], how="all", inplace=True)
For example:
>>> df = pd.DataFrame({"col1": [1, np.inf, -np.inf], "col2": [2, 3, np.nan]})
>>> df
col1 col2
0 1.0 2.0
1 inf 3.0
2 -inf NaN
>>> df.replace([np.inf, -np.inf], np.nan, inplace=True)
>>> df
col1 col2
0 1.0 2.0
1 NaN 3.0
2 NaN NaN
>>> df.dropna(subset=["col1", "col2"], how="all", inplace=True)
>>> df
col1 col2
0 1.0 2.0
1 NaN 3.0
The same method also works for Series.
ANSWER 2
Score 91
DEPRECATED
With option context, this is possible without permanently setting use_inf_as_na. For example:
with pd.option_context('mode.use_inf_as_na', True):
df = df.dropna(subset=['col1', 'col2'], how='all')
Of course it can be set to treat inf as NaN permanently with
pd.set_option('use_inf_as_na', True)
For older versions, replace use_inf_as_na with use_inf_as_null.
ANSWER 3
Score 18
Here is another method using .loc to replace inf with nan on a Series:
s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan
So, in response to the original question:
df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC'))
for i in range(3):
df.iat[i, i] = np.inf
df
A B C
0 inf 1.000000 1.000000
1 1.000000 inf 1.000000
2 1.000000 1.000000 inf
df.sum()
A inf
B inf
C inf
dtype: float64
df.apply(lambda s: s[np.isfinite(s)].dropna()).sum()
A 2
B 2
C 2
dtype: float64
ANSWER 4
Score 9
The above solution will modify the infs that are not in the target columns. To remedy that,
lst = [np.inf, -np.inf]
to_replace = {v: lst for v in ['col1', 'col2']}
df.replace(to_replace, np.nan)