Python Pandas merge only certain columns

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Over Ancient Waters Looping

--

Chapters
00:00 Question
00:51 Accepted answer (Score 108)
01:07 Answer 2 (Score 267)
01:30 Answer 3 (Score 43)
01:57 Answer 4 (Score 12)
02:44 Thank you

--

Full question
https://stackoverflow.com/questions/1797...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #merge #pandas

#avk47

ANSWER 1

Score 292

You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:

df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')

This will give you everything in the original df + add that one corresponding column in df2 that you want to join.

ACCEPTED ANSWER

Score 109

You could merge the sub-DataFrame (with just those columns):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

ANSWER 3

Score 12

You can use .loc to select the specific columns with all rows and then pull that. An example is below:

pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')

In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on 'key'. However, for dataframe2 you have specified .iloc which allows you to specific the rows and columns you want in a numerical format. Using :, your selecting all rows, but [0:5] selects the first 5 columns. You could use .loc to specify by name, but if your dealing with long column names, then .iloc may be better.

ANSWER 4

Score 10

This is to merge selected columns from two tables.

If table_1 contains t1_a,t1_b,t1_c..,id,..t1_z columns, and table_2 contains t2_a, t2_b, t2_c..., id,..t2_z columns, and only t1_a, id, t2_a are required in the final table, then

mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file    
mergedCSV.to_csv('output.csv',index = False)