Interpolate unstructured X,Y,Z data on best grid based on nearest neighbour distance for each points

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Ominous Technology Looping

--

Chapters
00:00 Interpolate Unstructured X,Y,Z Data On Best Grid Based On Nearest Neighbour Distance For Each Points
02:39 Accepted Answer Score 1
03:22 Answer 2 Score 2
03:47 Thank you

--

Full question
https://stackoverflow.com/questions/3481...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numpy #scipy #nearestneighbor #spatialinterpolation

#avk47

ANSWER 1

Score 2

The problem you want to solve is called the "all-nearest-neighbors problem". See this article for example: http://link.springer.com/article/10.1007/BF02187718

I believe solutions to this are O(N log N), so on the same order as KDTree.query, but in practice much, much faster than a bunch of separate queries. I'm sorry, I don't know of a python implementation of this.

ACCEPTED ANSWER

Score 1

I suggest you to go with KDTree.query.

You are searching of a carachteristic distance to scale your binning: I suggest you to take only a random subset of your points, and to use the Manhattan distance, becasue KDTree.query is very slow (and yet it is a n*log(n) complexity).

Here is my code:

# CreateTree
tree=scipy.spatial.KDTree(numpy.array(points)) # better give it a copy?
# Create random subsample of points
n_repr=1000
shuffled_points=numpy.array(points)
numpy.random.shuffle(shuffled_points)
shuffled_points=shuffled_points[:n_repr]
# Query the tree
(dists,points)=tree.query(shuffled_points,k=2,p=1)
# Get _extimate_ of average distance:
avg_dists=numpy.average(dists)
print('average distance Manhattan with nearest neighbour is:',avg_dists)

I suggest you to use the Manhattan distance ( https://en.wikipedia.org/wiki/Taxicab_geometry ) because it is was faster to compute than the euclidean distance. And since you need only an estimator of the average distance it should be sufficient.