Pandas convert dataframe to array of tuples
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Darkness Approaches Looping
--
Chapters
00:00 Pandas Convert Dataframe To Array Of Tuples
00:34 Answer 1 Score 262
00:49 Answer 2 Score 50
00:58 Accepted Answer Score 363
01:16 Answer 4 Score 55
02:54 Thank you
--
Full question
https://stackoverflow.com/questions/9758...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 363
list(data_set.itertuples(index=False))
As of 17.1, the above will return a list of namedtuples.
If you want a list of ordinary tuples, pass name=None as an argument:
list(data_set.itertuples(index=False, name=None))
ANSWER 2
Score 262
How about:
subset = data_set[['data_date', 'data_1', 'data_2']]
tuples = [tuple(x) for x in subset.to_numpy()]
for pandas < 0.24 use
tuples = [tuple(x) for x in subset.values]
ANSWER 3
Score 55
Motivation
Many data sets are large enough that we need to concern ourselves with speed/efficiency. So I offer this solution in that spirit. It happens to also be succinct.
For the sake of comparison, let's drop the index column
df = data_set.drop('index', 1)
Solution
I'll propose the use of zip and map
list(zip(*map(df.get, df)))
[('2012-02-17', 24.75, 25.03),
('2012-02-16', 25.0, 25.07),
('2012-02-15', 24.99, 25.15),
('2012-02-14', 24.68, 25.05),
('2012-02-13', 24.62, 24.77),
('2012-02-10', 24.38, 24.61)]
It happens to also be flexible if we wanted to deal with a specific subset of columns. We'll assume the columns we've already displayed are the subset we want.
list(zip(*map(df.get, ['data_date', 'data_1', 'data_2'])))
[('2012-02-17', 24.75, 25.03),
('2012-02-16', 25.0, 25.07),
('2012-02-15', 24.99, 25.15),
('2012-02-14', 24.68, 25.05),
('2012-02-13', 24.62, 24.77),
('2012-02-10', 24.38, 24.61)]
What is Quicker?
Turn's out records is quickest followed by asymptotically converging zipmap and iter_tuples
I'll use a library simple_benchmarks that I got from this post
from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()
import pandas as pd
import numpy as np
def tuple_comp(df): return [tuple(x) for x in df.to_numpy()]
def iter_namedtuples(df): return list(df.itertuples(index=False))
def iter_tuples(df): return list(df.itertuples(index=False, name=None))
def records(df): return df.to_records(index=False).tolist()
def zipmap(df): return list(zip(*map(df.get, df)))
funcs = [tuple_comp, iter_namedtuples, iter_tuples, records, zipmap]
for func in funcs:
b.add_function()(func)
def creator(n):
return pd.DataFrame({"A": random.randint(n, size=n), "B": random.randint(n, size=n)})
@b.add_arguments('Rows in DataFrame')
def argument_provider():
for n in (10 ** (np.arange(4, 11) / 2)).astype(int):
yield n, creator(n)
r = b.run()
Check the results
r.to_pandas_dataframe().pipe(lambda d: d.div(d.min(1), 0))
tuple_comp iter_namedtuples iter_tuples records zipmap
100 2.905662 6.626308 3.450741 1.469471 1.000000
316 4.612692 4.814433 2.375874 1.096352 1.000000
1000 6.513121 4.106426 1.958293 1.000000 1.316303
3162 8.446138 4.082161 1.808339 1.000000 1.533605
10000 8.424483 3.621461 1.651831 1.000000 1.558592
31622 7.813803 3.386592 1.586483 1.000000 1.515478
100000 7.050572 3.162426 1.499977 1.000000 1.480131
r.plot()
ANSWER 4
Score 50
A generic way:
[tuple(x) for x in data_set.to_records(index=False)]
