The Python Oracle

pandas matching database with string keeping index of database

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Over Ancient Waters Looping

--

Chapters
00:00 Pandas Matching Database With String Keeping Index Of Database
00:51 Answer 1 Score 0
01:20 Accepted Answer Score 3
02:15 Thank you

--

Full question
https://stackoverflow.com/questions/6283...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47



ACCEPTED ANSWER

Score 3


Because is filtered df0 DataFrame then is index values not changed if use Series.isin by df1['string_line_1', only order of columns is like in original df0:

out = df0[df0['string_line_0'].isin(df1['string_line_1'])]
print (out)
     name_id_code string_line_0
idx                            
0        0.010000             A
3       29.800000             D
5       88.100001             F
6       66.400001             G
9      551.000000             J

Or if use DataFrame.merge then for avoid lost df0.index is necessary add DataFrame.reset_index:

out = (df1.rename(columns={'string_line_1':'string_line_0'})
          .merge(df0.reset_index(), on='string_line_0'))
print (out)
  string_line_0  idx  name_id_code
0             A    0      0.010000
1             F    5     88.100001
2             J    9    551.000000
3             G    6     66.400001
4             D    3     29.800000

Similar solution, only same values in string_line_0 and string_line_1 columns:

out = (df1.merge(df0.reset_index(), left_on='string_line_1', right_on='string_line_0'))
print (out)
  string_line_1  idx  name_id_code string_line_0
0             A    0      0.010000             A
1             F    5     88.100001             F
2             J    9    551.000000             J
3             G    6     66.400001             G
4             D    3     29.800000             D



ANSWER 2

Score 0


You can do:

out = df0.loc[(df0["string_line_0"].isin(df1["string_line_1"]))].copy()
out["string_line_0"] = pd.Categorical(out["string_line_0"], categories=df1["string_line_1"].unique())
out.sort_values(by=["string_line_0"], inplace=True)

The first line filters df0 to just the rows where string_line_0 is in the string_line_1 column of df1.

The second line converts string_line_0 in the output df to a Categorical feature, which is then custom sorted by the order of the values in df1