The Python Oracle

Pandas join issue: columns overlap but no suffix specified

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT


Music by Eric Matyas
https://www.soundimage.org
Track title: The Builders

--

Chapters
00:00 Pandas Join Issue: Columns Overlap But No Suffix Specified
00:31 Accepted Answer Score 249
01:07 Answer 2 Score 67
01:29 Answer 3 Score 56
02:03 Answer 4 Score 3
02:18 Thank you

--

Full question
https://stackoverflow.com/questions/2664...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #join #pandas

#avk47



ACCEPTED ANSWER

Score 249


Your error on the snippet of data you posted is a little cryptic, in that because there are no common values, the join operation fails because the values don't overlap it requires you to supply a suffix for the left and right hand side:

In [173]:

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')
Out[173]:
       mukey_left  DI  PI  mukey_right  niccdcd
index                                          
0          100000  35  14          NaN      NaN
1         1000005  44  14          NaN      NaN
2         1000006  44  14          NaN      NaN
3         1000007  43  13          NaN      NaN
4         1000008  43  13          NaN      NaN

merge works because it doesn't have this restriction:

In [176]:

df_a.merge(df_b, on='mukey', how='left')
Out[176]:
     mukey  DI  PI  niccdcd
0   100000  35  14      NaN
1  1000005  44  14      NaN
2  1000006  44  14      NaN
3  1000007  43  13      NaN
4  1000008  43  13      NaN



ANSWER 2

Score 67


The .join() function is using the index of the passed as argument dataset, so you should use set_index or use .merge function instead.

Please find the two examples that should work in your case:

join_df = LS_sgo.join(MSU_pi.set_index('mukey'), on='mukey', how='left')

or

join_df = df_a.merge(df_b, on='mukey', how='left')



ANSWER 3

Score 56


This error indicates that the two tables have one or more column names that have the same column name.

The error message translates to: "I can see the same column in both tables but you haven't told me to rename either one before bringing them into the same table"

You either want to delete one of the columns before bringing it in from the other on using del df['column name'], or use lsuffix to re-write the original column, or rsuffix to rename the one that is being brought in.

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')



ANSWER 4

Score 3


Mainly join is used exclusively to join based on the index,not on the attribute names,so change the attributes names in two different dataframes,then try to join,they will be joined,else this error is raised