The Python Oracle

pandas DataFrame.groupby with a tolerance

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: City Beneath the Waves Looping

--

Chapters
00:00 Question
01:15 Accepted answer (Score 2)
02:21 Thank you

--

Full question
https://stackoverflow.com/questions/3578...

Accepted answer links:
[intervaltree]: https://pypi.python.org/pypi/intervaltre...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numpy #pandas

#avk47



ACCEPTED ANSWER

Score 2


I'd suggest using the intervaltree package on PyPI, instead of a pandas/numpy-esque solution.

The idea is to add each length +/- tolerance interval to the interval tree, having the interval map to the associated object. Then, iterate over the lengths and query the interval tree. This will give you all of the objects that have a tolerance interval containing the queried length.

from intervaltree import IntervalTree

t = IntervalTree()
for length, obj in zip(data['Length'], data['Object']):
    t[length-tolerance:length+tolerance] = obj

result = {}
for length in data['Length']:
    objs = [iv.data for iv in t[length]]
    result[length] = objs

The result dictionary is as follows:

{10.1: ['objA', 'objB'], 5.99: ['objD', 'objE'], 10.02: ['objA', 'objB'], 6.24: ['objD'], 7.4: ['objC']}

It's not quite in the format you specified, but it should be straightforward enough to make any changes to the format that you need.