Saturday, 3 November 2018

how do I cluster a list of geographic points by distance?

I have a list of points P=[p1,...pN] where pi=(latitudeI,longitudeI).

Using Python 3, I would like to find the smallest set of clusters (disjoint subsets of P) such that every member of a cluster is within 20km of every other member in the cluster.

Distance between two points is computed using the Vincenty method.

To make this a little more concrete, suppose I have a set of points such as

from numpy import *
points = array([[33.    , 41.    ],
       [33.9693, 41.3923],
       [33.6074, 41.277 ],
       [34.4823, 41.919 ],
       [34.3702, 41.1424],
       [34.3931, 41.078 ],
       [34.2377, 41.0576],
       [34.2395, 41.0211],
       [34.4443, 41.3499],
       [34.3812, 40.9793]])

Then I am trying to define this function:

from geopy.distance import vincenty
def clusters(points, distance):
    """Returns smallest list of clusters [C1,C2...Cn] such that
       for x,y in Ci, vincenty(x,y).km <= distance """
    return [points]  # Incorrect but gives the form of the output

NOTE: Many questions cluster on geo location and attribute. My question is for location only. This is for lat/lon, not Euclidean distance. There are other questions out there that give sort-of answers but not the answer to this question (many unanswered):



from how do I cluster a list of geographic points by distance?

No comments:

Post a Comment