Pandas - check if dataframe has negative value in any column
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: A Thousand Exotic Places Looping v001
--
Chapters
00:00 Pandas - Check If Dataframe Has Negative Value In Any Column
00:29 Answer 1 Score 5
00:37 Answer 2 Score 5
01:09 Accepted Answer Score 18
01:43 Thank you
--
Full question
https://stackoverflow.com/questions/6329...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
    Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: A Thousand Exotic Places Looping v001
--
Chapters
00:00 Pandas - Check If Dataframe Has Negative Value In Any Column
00:29 Answer 1 Score 5
00:37 Answer 2 Score 5
01:09 Accepted Answer Score 18
01:43 Thank you
--
Full question
https://stackoverflow.com/questions/6329...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 18
Actually, if speed is important, I did a few tests:
df = pd.DataFrame(np.random.randn(10000, 30000))
Test 1, slowest: pure pandas
(df < 0).any().any()
# 303 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Test 2, faster: switching over to numpy with .values for testing the presence of a True entry
(df < 0).values.any()
# 269 ms ± 8.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Test 3, maybe even faster, though not significant: switching over to numpy for the whole thing
(df.values < 0).any()
# 267 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
ANSWER 2
Score 5
You can chain two any
df.lt(0).any().any()
Out[96]: True
ANSWER 3
Score 5
This does the trick:
(df < 0).any().any()
To break it down, (df < 0) gives a dataframe with boolean entries. Then the first .any() returns a series of booleans, testing within each column for the presence of a True value. And then, the second .any() asks whether this returned series itself contains any True value.
This returns a simple:
True