Number of rows in my dataset is 500000+. I need Hausdorff distance of every id between itself and others. and repeat it for the whole dataset
I have a huge data set. Here is the small part:
df =
id_easy ordinal latitude longitude epoch day_of_week
0 aaa 1.0 22.0701 2.6685 01-01-11 07:45 Friday
1 aaa 2.0 22.0716 2.6695 01-01-11 07:45 Friday
2 aaa 3.0 22.0722 2.6696 01-01-11 07:46 Friday
3 bbb 1.0 22.1166 2.6898 01-01-11 07:58 Friday
4 bbb 2.0 22.1162 2.6951 01-01-11 07:59 Friday
5 ccc 1.0 22.1166 2.6898 01-01-11 07:58 Friday
6 ccc 2.0 22.1162 2.6951 01-01-11 07:59 Friday
I want to calculate Haudorff Distance:
import pandas as pd
import numpy as np
from scipy.spatial.distance import directed_hausdorff
from scipy.spatial.distance import pdist, squareform
u = np.array([(2.6685,22.0701),(2.6695,22.0716),(2.6696,22.0722)]) # coordinates of `id_easy` of taxi `aaa`
v = np.array([(2.6898,22.1166),(2.6951,22.1162)]) # coordinates of `id_easy` of taxi `bbb`
directed_hausdorff(u, v)[0]
Output is 0.05114626086039758
Now I want to calculate this distance for the whole dataset. For all id_easys. Desired output is matrix with 0 on diagonal (because distance between the aaa and aaa is 0):
aaa bbb ccc
aaa 0 0.05114 ...
bbb ... 0
ccc 0
from Hausdorff distance for large dataset in a fastest way
No comments:
Post a Comment