Pandas Return Separate DataFrame Values Based on Function
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Cool Puzzler LoFi
--
Chapters
00:00 Question
01:59 Accepted answer (Score 3)
02:35 Answer 2 (Score 1)
03:58 Answer 3 (Score 0)
04:41 Thank you
--
Full question
https://stackoverflow.com/questions/5948...
Question links:
[cartesian product]: https://en.wikipedia.org/wiki/Cartesian_...
[iterrows()]: https://pandas.pydata.org/pandas-docs/st...
Accepted answer links:
[pd.cut]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe #distance
#avk47
ACCEPTED ANSWER
Score 3
You can use pd.cut function to specify proper intervals in which latitudes are contained and simply merge two dataframes to obtain the result:
bins = [(i-1,i+1) for i in df1['Lat']]
bins = [item for subbins in bins for item in subbins]
df1['Interval'] = pd.cut(df1['Lat'], bins=bins)
df2['Interval'] = pd.cut(df2['Station_Lat'], bins=bins)
pd.merge(df1,df2)
This solution is slightly faster than yours. 10.2 ms ± 201 µs per loop vs 12.2 ms ± 1.34 ms per loop.
ANSWER 2
Score 1
Maybe it is faster:
df2= df2.sort_values("Station_Lat")
After sorting, you can use 'searchsorted":
df1["idx"]=df2.Station_Lat.searchsorted(df1.Lat)
"idx" is the 'nearest' station lat. index, or idx+1 is this. Maybe you need duplicate the last row in df2 (see the "searchsorted doc) to avoid over indexing it. The use "apply" with this custom function:
def dist(row):
if abs(row.Lat-df2.loc[row.idx].Station_Lat)<=1:
return df2.loc[row.idx].Station
elif abs(row.Lat-df2.loc[row.idx+1].Station_Lat)<=1:
return df2.loc[row.idx+1].Station
return False
df1.apply(dist,axis=1)
0 ABC
1 False
2 False
3 JKL
dtype: object
Edit: Because in 'dist()' it is assumed that df2.index is ordered and monotonic increasing (see: roww.idx+1), the 1st code line must be corrected:
df2= df2.sort_values("Station_Lat").reset_index(drop=True)
And 'dist()' is somewhat faster that way (but doesn't beat the Cartesian product method):
def dist(row):
idx=row.idx
lat1,lat2= df2.loc[idx:idx+1,"Station_Lat"]
if abs(row.Lat-lat1)<=1:
return df2.loc[idx,"Station"]
elif abs(row.Lat-lat2)<=1:
return df2.loc[idx+1,"Station"]
return False
ANSWER 3
Score 0
How about a lambda?
df3[df3.apply(lambda x, col1='Lat', col2='Station_Lat': x[col1]-x[col2] >= -1 and x[col1]-x[col2] <= 1, axis=1)]['Station']
Output:
0 ABC
15 JKL
Edit: Here's a second solution. (Note: This also uses abs() since >=-1 and <= 1 seems redundant.)
for i in df1.index:
for j in df2.index:
if abs(df1.loc[i, 'Lat'] - df2.loc[j, 'Station_Lat']) <=1:
print(df2.loc[j, 'Station'])
Or, in list comprehension form:
df2.loc[[i for i in df1.index for j in df2.index if abs(df1.loc[i, 'Lat'] - df2.loc[j, 'Station_Lat']) <=1], 'Station']
Output:
ABC
JKL