Python pandas Filtering out nan from a data selection of a column of strings
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Dream Voyager Looping
--
Chapters
00:00 Python Pandas Filtering Out Nan From A Data Selection Of A Column Of Strings
00:47 Accepted Answer Score 339
01:39 Answer 2 Score 413
01:58 Answer 3 Score 10
02:07 Answer 4 Score 17
02:12 Thank you
--
Full question
https://stackoverflow.com/questions/2255...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Dream Voyager Looping
--
Chapters
00:00 Python Pandas Filtering Out Nan From A Data Selection Of A Column Of Strings
00:47 Accepted Answer Score 339
01:39 Answer 2 Score 413
01:58 Answer 3 Score 10
02:07 Answer 4 Score 17
02:12 Thank you
--
Full question
https://stackoverflow.com/questions/2255...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
ANSWER 1
Score 413
Simplest of all solutions:
filtered_df = df[df['name'].notnull()]
Thus, it filters out only rows that doesn't have NaN values in 'name' column.
For multiple columns:
filtered_df = df[df[['name', 'country', 'region']].notnull().all(1)]
ACCEPTED ANSWER
Score 339
Just drop them:
nms.dropna(thresh=2)
this will drop all rows where there are at least two non-NaN.
Then you could then drop where name is NaN:
In [87]:
nms
Out[87]:
movie name rating
0 thg John 3
1 thg NaN 4
3 mol Graham NaN
4 lob NaN NaN
5 lob NaN NaN
[5 rows x 3 columns]
In [89]:
nms = nms.dropna(thresh=2)
In [90]:
nms[nms.name.notnull()]
Out[90]:
movie name rating
0 thg John 3
3 mol Graham NaN
[2 rows x 3 columns]
EDIT
Actually looking at what you originally want you can do just this without the dropna call:
nms[nms.name.notnull()]
UPDATE
Looking at this question 3 years later, there is a mistake, firstly thresh arg looks for at least n non-NaN values so in fact the output should be:
In [4]:
nms.dropna(thresh=2)
Out[4]:
movie name rating
0 thg John 3.0
1 thg NaN 4.0
3 mol Graham NaN
It's possible that I was either mistaken 3 years ago or that the version of pandas I was running had a bug, both scenarios are entirely possible.
ANSWER 3
Score 17
df.dropna(subset=['columnName1', 'columnName2'])
ANSWER 4
Score 10
df = pd.DataFrame({'movie': ['thg', 'thg', 'mol', 'mol', 'lob', 'lob'],'rating': [3., 4., 5., np.nan, np.nan, np.nan],'name': ['John','James', np.nan, np.nan, np.nan,np.nan]})
for col in df.columns:
df = df[~pd.isnull(df[col])]