Selecting with complex criteria from pandas.DataFrame
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Ominous Technology Looping
--
Chapters
00:00 Selecting With Complex Criteria From Pandas.Dataframe
00:25 Accepted Answer Score 539
01:33 Answer 2 Score 84
02:06 Answer 3 Score 8
02:33 Answer 4 Score 57
03:13 Thank you
--
Full question
https://stackoverflow.com/questions/1531...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 539
Sure! Setup:
>>> import pandas as pd
>>> from random import randint
>>> df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)],
                   'B': [randint(1, 9)*10 for x in range(10)],
                   'C': [randint(1, 9)*100 for x in range(10)]})
>>> df
   A   B    C
0  9  40  300
1  9  70  700
2  5  70  900
3  8  80  900
4  7  50  200
5  9  30  900
6  2  80  700
7  2  80  400
8  5  80  300
9  7  70  800
We can apply column operations and get boolean Series objects:
>>> df["B"] > 50
0    False
1     True
2     True
3     True
4    False
5    False
6     True
7     True
8     True
9     True
Name: B
>>> (df["B"] > 50) & (df["C"] != 900)
or
>>> (df["B"] > 50) & ~(df["C"] == 900)
0    False
1    False
2     True
3     True
4    False
5    False
6    False
7    False
8    False
9    False
[Update, to switch to new-style .loc]:
And then we can use these to index into the object. For read access, you can chain indices:
>>> df["A"][(df["B"] > 50) & (df["C"] != 900)]
2    5
3    8
Name: A, dtype: int64
but you can get yourself into trouble because of the difference between a view and a copy doing this for write access.  You can use .loc instead:
>>> df.loc[(df["B"] > 50) & (df["C"] != 900), "A"]
2    5
3    8
Name: A, dtype: int64
>>> df.loc[(df["B"] > 50) & (df["C"] != 900), "A"].values
array([5, 8], dtype=int64)
>>> df.loc[(df["B"] > 50) & (df["C"] != 900), "A"] *= 1000
>>> df
      A   B    C
0     9  40  300
1     9  70  700
2  5000  70  900
3  8000  80  900
4     7  50  200
5     9  30  900
6     2  80  700
7     2  80  400
8     5  80  300
9     7  70  800
ANSWER 2
Score 84
Another solution is to use the query method:
import pandas as pd
from random import randint
df = pd.DataFrame({'A': [randint(1, 9) for x in xrange(10)],
                   'B': [randint(1, 9) * 10 for x in xrange(10)],
                   'C': [randint(1, 9) * 100 for x in xrange(10)]})
print df
   A   B    C
0  7  20  300
1  7  80  700
2  4  90  100
3  4  30  900
4  7  80  200
5  7  60  800
6  3  80  900
7  9  40  100
8  6  40  100
9  3  10  600
print df.query('B > 50 and C != 900')
   A   B    C
1  7  80  700
2  4  90  100
4  7  80  200
5  7  60  800
Now if you want to change the returned values in column A you can save their index:
my_query_index = df.query('B > 50 & C != 900').index
....and use .iloc to change them i.e:
df.iloc[my_query_index, 0] = 5000
print df
      A   B    C
0     7  20  300
1  5000  80  700
2  5000  90  100
3     4  30  900
4  5000  80  200
5  5000  60  800
6     3  80  900
7     9  40  100
8     6  40  100
9     3  10  600
ANSWER 3
Score 57
And remember to use parenthesis!
Keep in mind that & operator takes a precedence over operators such as > or < etc. That is why 
4 < 5 & 6 > 4
evaluates to False. Therefore if you're using pd.loc, you need to put brackets around your logical statements, otherwise you get an error. That's why do:
df.loc[(df['A'] > 10) & (df['B'] < 15)]
instead of
df.loc[df['A'] > 10 & df['B'] < 15]
which would result in
TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
ANSWER 4
Score 8
You can use pandas it has some built in functions for comparison. So if you want to select values of "A" that are met by the conditions of "B" and "C" (assuming you want back a DataFrame pandas object)
df[['A']][df.B.gt(50) & df.C.ne(900)]
df[['A']] will give you back column A in DataFrame format.
pandas gt function will return the positions of column B that are greater than 50 and ne will return the positions not equal to 900.