The Python Oracle

How to drop rows from pandas data frame that contains a particular string in a particular column?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Puzzle2

--

Chapters
00:00 Question
00:38 Accepted answer (Score 299)
01:05 Answer 2 (Score 150)
01:25 Answer 3 (Score 52)
01:50 Answer 4 (Score 27)
02:17 Thank you

--

Full question
https://stackoverflow.com/questions/2867...

Answer 3 links:
[TypeError: bad operand type for unary ~: float]: https://stackoverflow.com/questions/5229...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47



ACCEPTED ANSWER

Score 311


pandas has vectorized string operations, so you can just filter out the rows that contain the string you don't want:

In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))

In [92]: df
Out[92]:
   A          C
0  5        foo
1  3        bar
2  5  fooXYZbar
3  6        bat

In [93]: df[~df.C.str.contains("XYZ")]
Out[93]:
   A    C
0  5  foo
1  3  bar
3  6  bat



ANSWER 2

Score 152


If your string constraint is not just one string you can drop those corresponding rows with:

df = df[~df['your column'].isin(['list of strings'])]

The above will drop all rows containing elements of your list




ANSWER 3

Score 55


This will only work if you want to compare exact strings. It will not work in case you want to check if the column string contains any of the strings in the list.

The right way to compare with a list would be :

searchfor = ['john', 'doe']
df = df[~df.col.str.contains('|'.join(searchfor))]



ANSWER 4

Score 16


new_df = df[df.C != 'XYZ']

Reference: https://chrisalbon.com/python/data_wrangling/pandas_dropping_column_and_rows/