The Python Oracle

pandas DataFrame.groupby with a tolerance

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game Looping

--

Chapters
00:00 Pandas Dataframe.Groupby With A Tolerance
00:54 Accepted Answer Score 2
01:42 Thank you

--

Full question
https://stackoverflow.com/questions/3578...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numpy #pandas

#avk47



ACCEPTED ANSWER

Score 2


I'd suggest using the intervaltree package on PyPI, instead of a pandas/numpy-esque solution.

The idea is to add each length +/- tolerance interval to the interval tree, having the interval map to the associated object. Then, iterate over the lengths and query the interval tree. This will give you all of the objects that have a tolerance interval containing the queried length.

from intervaltree import IntervalTree

t = IntervalTree()
for length, obj in zip(data['Length'], data['Object']):
    t[length-tolerance:length+tolerance] = obj

result = {}
for length in data['Length']:
    objs = [iv.data for iv in t[length]]
    result[length] = objs

The result dictionary is as follows:

{10.1: ['objA', 'objB'], 5.99: ['objD', 'objE'], 10.02: ['objA', 'objB'], 6.24: ['objD'], 7.4: ['objC']}

It's not quite in the format you specified, but it should be straightforward enough to make any changes to the format that you need.