Tuesday, 8 October 2019

Clustering lines based on start- and end point without knowing the number of clusters

I use Hough transformation to detect lines in a soccer field. Below is an example of the detected lines (61 lines in total):

enter image description here

All lines are the output of the cv2.HoughLinesP-function and are presented as a numpy-array in the following format:

 [x1, y1, x2, y2]

There are 3 distinct line groupes that I want to classify; goal_line, goal_area_line and penalty_area_line. The 4th line visible on the image is a line caused by the billboards and I want to ignore this one. I am struggling with choosing the number of clusters, k. This is because when the ball moves from the middle line towards the goal area, we first only see 1 line, namely penalty_area_line. When the ball moves further to the left, the camera will likely follow and we will see more lines. I use the following to calculate the gradient and intercept of each line:

def gradient_intercept(x1, y1, x2, y2):
   dx = x2 - x1
   dy = y2 - y1
   radius = math.atan2(-dy, dx)
   radius %= 2 * math.pi
   gradient = -math.degrees(radius)
   if gradient <= -180:
     gradient = gradient + 180
   gradient = gradient + 90
   intercept = y1 - gradient * x1

   return gradient, intercept

Next, I create a 2d-array for all the lines and use this to calculate a distance matrix.

import numpy as np
import scipy.spatial as ss     

def distance_matrix(lines):
   xy = np.empty((0, 2), int)

   try:
      for line in lines:
         for x1, y1, x2, y2 in line:
            gradient, intercept = gradient_intercept(x1, y1, x2, y2)
            xy = np.append(xy, [[0, gradient]], axis=0)
   except Exception as e:
      print(str(e))

   if len(xy) > 2:
      distance_matrix = ss.distance_matrix(xy, xy)

The function yields the following output:

[[0.        6.10702413 5.12577724 ... 1.11858265 0.02889456 2.02399679]
[6.10702413 0.         0.98124689 ... 7.22560678 6.07812957 4.08302733]
[5.12577724 0.98124689 0.         ... 6.24435989 5.09688267 3.10178044]
...
[1.11858265 7.22560678 6.24435989 ... 0.         1.14747721 3.14257944]
[0.02889456 6.07812957 5.09688267 ... 1.14747721 0.         1.99510223]
[2.02399679 4.08302733 3.10178044 ... 3.14257944 1.99510223 0.        ]]

Next, I need to cluster the lines together based on their intercept. I assume that (in the case of 4 visible clusters)

  intercept_billboards > intercept_goal_line > intercept_goal_area_line > intercept_penalty_area_line

Searching through SO I believe that DBSCAN is suitable for my case

from sklearn.cluster import DBSCAN
db = DBSCAN(eps=0.2,min_samples=2)  # minimum of two lines in order to be considered a cluster
db.fit_predict(distance_matrix)
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

Inspecting n_clusters_ I see sometime values of 5 or 6. Question: What is exactly eps? Which scale is it? What could be an appropriate value in my case? Second question Is my methodology (intercepts -> distance_matrix) a correct one?

Thanks in advance



from Clustering lines based on start- and end point without knowing the number of clusters

No comments:

Post a Comment