How to form tuple column from two columns in Pandas
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Quiet Intelligence
--
Chapters
00:00 Question
01:04 Accepted answer (Score 350)
01:30 Answer 2 (Score 99)
02:03 Answer 3 (Score 29)
02:18 Answer 4 (Score 8)
02:47 Thank you
--
Full question
https://stackoverflow.com/questions/1603...
Answer 2 links:
[itertuples]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #dataframe #pandas #tuples
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Quiet Intelligence
--
Chapters
00:00 Question
01:04 Accepted answer (Score 350)
01:30 Answer 2 (Score 99)
02:03 Answer 3 (Score 29)
02:18 Answer 4 (Score 8)
02:47 Thank you
--
Full question
https://stackoverflow.com/questions/1603...
Answer 2 links:
[itertuples]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #dataframe #pandas #tuples
#avk47
ACCEPTED ANSWER
Score 380
Get comfortable with zip. It comes in handy when dealing with column data.
df['new_col'] = list(zip(df.lat, df.long))
It's less complicated and faster than using apply or map. Something like np.dstack is twice as fast as zip, but wouldn't give you tuples.
ANSWER 2
Score 107
In [10]: df
Out[10]:
A B lat long
0 1.428987 0.614405 0.484370 -0.628298
1 -0.485747 0.275096 0.497116 1.047605
2 0.822527 0.340689 2.120676 -2.436831
3 0.384719 -0.042070 1.426703 -0.634355
4 -0.937442 2.520756 -1.662615 -1.377490
5 -0.154816 0.617671 -0.090484 -0.191906
6 -0.705177 -1.086138 -0.629708 1.332853
7 0.637496 -0.643773 -0.492668 -0.777344
8 1.109497 -0.610165 0.260325 2.533383
9 -1.224584 0.117668 1.304369 -0.152561
In [11]: df['lat_long'] = df[['lat', 'long']].apply(tuple, axis=1)
In [12]: df
Out[12]:
A B lat long lat_long
0 1.428987 0.614405 0.484370 -0.628298 (0.484370195967, -0.6282975278)
1 -0.485747 0.275096 0.497116 1.047605 (0.497115615839, 1.04760475074)
2 0.822527 0.340689 2.120676 -2.436831 (2.12067574274, -2.43683074367)
3 0.384719 -0.042070 1.426703 -0.634355 (1.42670326172, -0.63435462504)
4 -0.937442 2.520756 -1.662615 -1.377490 (-1.66261469102, -1.37749004179)
5 -0.154816 0.617671 -0.090484 -0.191906 (-0.0904840623396, -0.191905582481)
6 -0.705177 -1.086138 -0.629708 1.332853 (-0.629707821728, 1.33285348929)
7 0.637496 -0.643773 -0.492668 -0.777344 (-0.492667604075, -0.777344111021)
8 1.109497 -0.610165 0.260325 2.533383 (0.26032456699, 2.5333825651)
9 -1.224584 0.117668 1.304369 -0.152561 (1.30436900612, -0.152560909725)
ANSWER 3
Score 32
Pandas has the itertuples method to do exactly this:
list(df[['lat', 'long']].itertuples(index=False, name=None))
ANSWER 4
Score 7
I'd like to add df.values.tolist(). (as long as you don't mind to get a column of lists rather than tuples)
import pandas as pd
import numpy as np
size = int(1e+07)
df = pd.DataFrame({'a': np.random.rand(size), 'b': np.random.rand(size)})
%timeit df.values.tolist()
1.47 s ± 38.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit list(zip(df.a,df.b))
1.92 s ± 131 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)