pandas matching database with string keeping index of database

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Breezy Bay

--

Chapters
00:00 Question
01:16 Accepted answer (Score 3)
02:29 Answer 2 (Score 0)
03:05 Thank you

--

Full question
https://stackoverflow.com/questions/6283...

Accepted answer links:
[Series.isin]: http://pandas.pydata.org/pandas-docs/sta...
[DataFrame.merge]: http://pandas.pydata.org/pandas-docs/sta...
[DataFrame.reset_index]: http://pandas.pydata.org/pandas-docs/sta...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47

ACCEPTED ANSWER

Score 3

Because is filtered df0 DataFrame then is index values not changed if use Series.isin by df1['string_line_1', only order of columns is like in original df0:

out = df0[df0['string_line_0'].isin(df1['string_line_1'])]
print (out)
     name_id_code string_line_0
idx                            
0        0.010000             A
3       29.800000             D
5       88.100001             F
6       66.400001             G
9      551.000000             J

Or if use DataFrame.merge then for avoid lost df0.index is necessary add DataFrame.reset_index:

out = (df1.rename(columns={'string_line_1':'string_line_0'})
          .merge(df0.reset_index(), on='string_line_0'))
print (out)
  string_line_0  idx  name_id_code
0             A    0      0.010000
1             F    5     88.100001
2             J    9    551.000000
3             G    6     66.400001
4             D    3     29.800000

Similar solution, only same values in string_line_0 and string_line_1 columns:

out = (df1.merge(df0.reset_index(), left_on='string_line_1', right_on='string_line_0'))
print (out)
  string_line_1  idx  name_id_code string_line_0
0             A    0      0.010000             A
1             F    5     88.100001             F
2             J    9    551.000000             J
3             G    6     66.400001             G
4             D    3     29.800000             D

ANSWER 2

Score 0

You can do:

out = df0.loc[(df0["string_line_0"].isin(df1["string_line_1"]))].copy()
out["string_line_0"] = pd.Categorical(out["string_line_0"], categories=df1["string_line_1"].unique())
out.sort_values(by=["string_line_0"], inplace=True)

The first line filters df0 to just the rows where string_line_0 is in the string_line_1 column of df1.

The second line converts string_line_0 in the output df to a Categorical feature, which is then custom sorted by the order of the values in df1