cartesian product in pandas
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Techno Bleepage Open
--
Chapters
00:00 Cartesian Product In Pandas
00:22 Accepted Answer Score 194
01:02 Answer 2 Score 38
01:51 Answer 3 Score 112
02:11 Answer 4 Score 46
02:23 Thank you
--
Full question
https://stackoverflow.com/questions/1326...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #merge #cartesianproduct #crossjoin
#avk47
ACCEPTED ANSWER
Score 194
In recent versions of Pandas (>= 1.2) this is built into merge so you can do:
from pandas import DataFrame
df1 = DataFrame({'col1':[1,2],'col2':[3,4]})
df2 = DataFrame({'col3':[5,6]})    
df1.merge(df2, how='cross')
This is equivalent to the previous pandas < 1.2 answer but is easier to read.
For pandas < 1.2:
If you have a key that is repeated for each row, then you can produce a cartesian product using merge (like you would in SQL).
from pandas import DataFrame, merge
df1 = DataFrame({'key':[1,1], 'col1':[1,2],'col2':[3,4]})
df2 = DataFrame({'key':[1,1], 'col3':[5,6]})
merge(df1, df2,on='key')[['col1', 'col2', 'col3']]
Output:
   col1  col2  col3
0     1     3     5
1     1     3     6
2     2     4     5
3     2     4     6
See here for the documentation: http://pandas.pydata.org/pandas-docs/stable/merging.html
ANSWER 2
Score 112
Use pd.MultiIndex.from_product as an index in an otherwise empty dataframe, then reset its index, and you're done. 
a = [1, 2, 3]
b = ["a", "b", "c"]
index = pd.MultiIndex.from_product([a, b], names = ["a", "b"])
pd.DataFrame(index = index).reset_index()
out:
   a  b
0  1  a
1  1  b
2  1  c
3  2  a
4  2  b
5  2  c
6  3  a
7  3  b
8  3  c
ANSWER 3
Score 46
Minimal code needed for this one. Create a common 'key' to cartesian merge the two:
df1['key'] = 0
df2['key'] = 0
df_cartesian = df1.merge(df2, how='outer')
ANSWER 4
Score 38
This won't win a code golf competition, and borrows from the previous answers - but clearly shows how the key is added, and how the join works. This creates 2 new data frames from lists, then adds the key to do the cartesian product on.
My use case was that I needed a list of all store IDs on for each week in my list. So, I created a list of all the weeks I wanted to have, then a list of all the store IDs I wanted to map them against.
The merge I chose left, but would be semantically the same as inner in this setup. You can see this in the documentation on merging, which states it does a Cartesian product if key combination appears more than once in both tables - which is what we set up.
days = pd.DataFrame({'date':list_of_days})
stores = pd.DataFrame({'store_id':list_of_stores})
stores['key'] = 0
days['key'] = 0
days_and_stores = days.merge(stores, how='left', on = 'key')
days_and_stores.drop('key',1, inplace=True)