How to filter rows containing a string pattern from a Pandas dataframe
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Darkness Approaches Looping
--
Chapters
00:00 How To Filter Rows Containing A String Pattern From A Pandas Dataframe
00:22 Answer 1 Score 35
00:32 Accepted Answer Score 553
00:39 Answer 3 Score 177
01:37 Answer 4 Score 19
02:46 Thank you
--
Full question
https://stackoverflow.com/questions/2797...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
ACCEPTED ANSWER
Score 553
In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
     ids  vals
0  aball     1
1  bball     2
3  fball     4
ANSWER 2
Score 177
df[df['ids'].str.contains('ball', na = False)] # valid for (at least) pandas version 0.17.1
Step-by-step explanation (from inner to outer):
df['ids']selects theidscolumn of the data frame (technically, the objectdf['ids']is of typepandas.Series)df['ids'].strallows us to apply vectorized string methods (e.g.,lower,contains) to the Seriesdf['ids'].str.contains('ball')checks each element of the Series as to whether the element value has the string 'ball' as a substring. The result is a Series of Booleans indicatingTrueorFalseabout the existence of a 'ball' substring.df[df['ids'].str.contains('ball')]applies the Boolean 'mask' to the dataframe and returns a view containing appropriate records.na = Falseremoves NA / NaN values from consideration; otherwise a ValueError may be returned.
ANSWER 3
Score 35
>>> mask = df['ids'].str.contains('ball')    
>>> mask
0     True
1     True
2    False
3     True
Name: ids, dtype: bool
>>> df[mask]
     ids  vals
0  aball     1
1  bball     2
3  fball     4
ANSWER 4
Score 19
If you want to set the column you filter on as a new index, you could also consider to use .filter; if you want to keep it as a separate column then str.contains is the way to go.
Let's say you have
df = pd.DataFrame({'vals': [1, 2, 3, 4, 5], 'ids': [u'aball', u'bball', u'cnut', u'fball', 'ballxyz']})
       ids  vals
0    aball     1
1    bball     2
2     cnut     3
3    fball     4
4  ballxyz     5
and your plan is to filter all rows in which ids contains ball AND set ids as new index, you can do
df.set_index('ids').filter(like='ball', axis=0)
which gives
         vals
ids          
aball       1
bball       2
fball       4
ballxyz     5
But filter also allows you to pass a regex, so you could also filter only those rows where the column entry ends with ball. In this case you use
df.set_index('ids').filter(regex='ball$', axis=0)
       vals
ids        
aball     1
bball     2
fball     4
Note that now the entry with ballxyz is not included as it starts with ball and does not end with it.
If you want to get all entries that start with ball you can simple use
df.set_index('ids').filter(regex='^ball', axis=0)
yielding
         vals
ids          
ballxyz     5
The same works with columns; all you then need to change is the axis=0 part. If you filter based on columns, it would be axis=1.