dropping rows from dataframe based on a "not in" condition
--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Quiet Intelligence
--
Chapters
00:00 Dropping Rows From Dataframe Based On A &Quot;Not In&Quot; Condition
00:30 Answer 1 Score 66
00:59 Accepted Answer Score 330
01:25 Thank you
--
Full question
https://stackoverflow.com/questions/2796...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Quiet Intelligence
--
Chapters
00:00 Dropping Rows From Dataframe Based On A &Quot;Not In&Quot; Condition
00:30 Answer 1 Score 66
00:59 Accepted Answer Score 330
01:25 Thank you
--
Full question
https://stackoverflow.com/questions/2796...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 330
You can use pandas.Dataframe.isin.
pandas.Dateframe.isin will return boolean values depending on whether each element is inside the list a or not. You then invert this with the ~ to convert True to False and vice versa.
import pandas as pd
a = ['2015-01-01' , '2015-02-01']
df = pd.DataFrame(data={'date':['2015-01-01' , '2015-02-01', '2015-03-01' , '2015-04-01', '2015-05-01' , '2015-06-01']})
print(df)
# date
#0 2015-01-01
#1 2015-02-01
#2 2015-03-01
#3 2015-04-01
#4 2015-05-01
#5 2015-06-01
df = df[~df['date'].isin(a)]
print(df)
# date
#2 2015-03-01
#3 2015-04-01
#4 2015-05-01
#5 2015-06-01
ANSWER 2
Score 66
You can use Series.isin:
df = df[~df.datecolumn.isin(a)]
While the error message suggests that all() or any() can be used, they are useful only when you want to reduce the result into a single Boolean value. That is however not what you are trying to do now, which is to test the membership of every values in the Series against the external list, and keep the results intact (i.e., a Boolean Series which will then be used to slice the original DataFrame).
You can read more about this in the Gotchas.