The Python Oracle

Pandas - check if dataframe has negative value in any column

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: A Thousand Exotic Places Looping v001

--

Chapters
00:00 Pandas - Check If Dataframe Has Negative Value In Any Column
00:29 Answer 1 Score 5
00:37 Answer 2 Score 5
01:09 Accepted Answer Score 18
01:43 Thank you

--

Full question
https://stackoverflow.com/questions/6329...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47



ACCEPTED ANSWER

Score 18


Actually, if speed is important, I did a few tests:

df = pd.DataFrame(np.random.randn(10000, 30000))

Test 1, slowest: pure pandas

(df < 0).any().any()
# 303 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Test 2, faster: switching over to numpy with .values for testing the presence of a True entry

(df < 0).values.any()
# 269 ms ± 8.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Test 3, maybe even faster, though not significant: switching over to numpy for the whole thing

(df.values < 0).any()
# 267 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



ANSWER 2

Score 5


You can chain two any

df.lt(0).any().any()
Out[96]: True



ANSWER 3

Score 5


This does the trick:

(df < 0).any().any()

To break it down, (df < 0) gives a dataframe with boolean entries. Then the first .any() returns a series of booleans, testing within each column for the presence of a True value. And then, the second .any() asks whether this returned series itself contains any True value.

This returns a simple:

True