how to create dummy variables in Pandas when columns can have mixed types?
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Beneath the City Looping
--
Chapters
00:00 How To Create Dummy Variables In Pandas When Columns Can Have Mixed Types?
00:48 Answer 1 Score 2
01:49 Accepted Answer Score 3
02:10 Thank you
--
Full question
https://stackoverflow.com/questions/3637...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Beneath the City Looping
--
Chapters
00:00 How To Create Dummy Variables In Pandas When Columns Can Have Mixed Types?
00:48 Answer 1 Score 2
01:49 Accepted Answer Score 3
02:10 Thank you
--
Full question
https://stackoverflow.com/questions/3637...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 3
Two ways you could do
In [37]: pd.to_numeric(df.A, errors='coerce').notnull() & (df.A > 0)
Out[37]:
0 True
1 True
2 False
3 False
4 False
Name: A, dtype: bool
In [38]: df.A.apply(np.isreal) & (df.A > 0)
Out[38]:
0 True
1 True
2 False
3 False
4 False
Name: A, dtype: bool
Third could perhaps be slow
In [39]: df.A.str.isnumeric().isnull() & (df.A > 0)
Out[39]:
0 True
1 True
2 False
3 False
4 False
Name: A, dtype: bool
ANSWER 2
Score 2
Update: @JohnGalt pointed out in the comments a better way would be to use pd.to_numeric with errors='coerce':
# Your condition here, instead of `> 0`, using the fact that NaN > 0 == false
[18]: df['dummy1'] = (pd.to_numeric(df.A, errors='coerce').notnull() > 0).astype('int')
[19]: df
Out[19]:
A dummy1
0 1 1
1 2 1
2 -1 0
3 NaN 0
4 rh 0
The best way One general way to create such dummy variables will be along these lines:
def foo(a):
try:
tmp = int(a)
return 1 if tmp > 0 else 0 # Your condition here.
except:
return 0
[12]: df.A.map(foo)
Out[12]:
0 1
1 1
2 1
3 0
4 0
Name: A, dtype: int64
You are doing the operations in Python 2.7, where comparisons between str and int are (unfortunately) allowed. The operations fail on Python 3:
[5]: df.A > 0
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-890e73655a37> in <module>()
----> 1 df.A > 0
/home/utkarshu/miniconda3/envs/py35/lib/python3.5/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
724 other = np.asarray(other)
725
--> 726 res = na_op(values, other)
727 if isscalar(res):
728 raise TypeError('Could not compare %s type with Series'
/home/utkarshu/miniconda3/envs/py35/lib/python3.5/site-packages/pandas/core/ops.py in na_op(x, y)
646 result = lib.vec_compare(x, y, op)
647 else:
--> 648 result = lib.scalar_compare(x, y, op)
649 else:
650
pandas/lib.pyx in pandas.lib.scalar_compare (pandas/lib.c:14186)()
TypeError: unorderable types: str() > int()