I have 2 datasets border and df.
PART 1:
df =
id_easy ordinal latitude longitude epoch day_of_week
0 e35f652a 68 22.1111 7.2222 1465084811 Sunday
1 e35f652a 69 22.1111 7.2222 1465084870 Sunday
2 e35f652a 70 22.1111 7.2222 1465084930 Sunday
3 e35f652a 71 22.1111 7.2222 1465084990 Sunday
4 e35f652a 72 22.1111 7.2222 1465085050 Sunday
turin = df.loc[df['ordinal'] == 1]
crs = {'init':'epsg:4326'}
geometry = [Point(xy) for xy in zip(turin.longitude,turin.latitude)]
turin_point = gpd.GeoDataFrame(turin,crs=crs,geometry=geometry) #to get geometry
PART 2:
border.shape = (931, 674) first number in column names shows the zone name. For example, in 12_longitude_1 = zone 12, longitude, 1-st. I have random zones as you can see (12,14,23... and so on)
Here is sample data frame:
border =
12_longitude_1 12_latitude_1 14_longitude_2 14_latitude_2 23_longitude_3 23_latitude_3
11 12 13 14 15 16
11 12 13 14 15 16
11 12 13 14 15 16
FINAL PART:
I want to check turin_point within the zone 12. I am doing the following operation with first 2 columns:
Code for 12_longitude_1,12_latitude_1:
border = border[['longitude_1','latitude_1']].dropna()
border.longitude_1 = border.longitude_1.replace(r'[()]', '', regex=True)
border.latitude_1 = border.latitude_1.replace(r'[()]', '', regex=True)
border.longitude_1 = pd.to_numeric(border.longitude_1, errors='coerce')
border.latitude_1 = pd.to_numeric(border.latitude_1, errors='coerce')
geometry2 = [Point(xy) for xy in zip(border.longitude_1,border.latitude_1)]
border_point = gpd.GeoDataFrame(border,crs=crs,geometry=geometry2)
turin_final = Polygon([[p.x, p.y] for p in border_point.geometry])
within_turin = turin_point[turin_point.geometry.within(turin_final)]
long_lat_1 = len(within_turin)
Finally long_lat_12 gives me 1697
I want to automate this process for the whole dataset (for all column couples)?
Desired output:
Libraries to use:
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon
TRY:
pd_out = pd.DataFrame({'zone': [], 'number': []})
for col_num in range(0, len(border.columns)-1, 2):
curr_lon_name = border.columns[col_num]
curr_lat_name = border.columns[col_num + 1]
num = curr_lon_name.split("_")[-1]
border = border[[curr_lon_name, curr_lat_name]].dropna()
border[curr_lon_name] = border[curr_lon_name].replace(r'[()]', '', regex=True)
border[curr_lat_name] = border[curr_lat_name].replace(r'[()]', '', regex=True)
border[curr_lon_name] = pd.to_numeric(border[curr_lon_name], errors='coerce')
border[curr_lat_name] = pd.to_numeric(border[curr_lat_name], errors='coerce')
geometry2 = [Point(xy) for xy in zip(border[curr_lon_name],border[curr_lat_name])]
border_point = gpd.GeoDataFrame(border,crs=crs,geometry=geometry2)
turin_final = Polygon([[p.x, p.y] for p in border_point.geometry])
within_turin = turin_point[turin_point.geometry.within(turin_final)]
curr_len = len(within_turin)
pd_out = pd_out.append({'zone': "long_lat_{}".format(num), 'number': curr_len}, ignore_index=True)
Gives me only 1 row:
zone number
0 long_lat_1 1697.0
I want all rows and names as indicated in the photo
p.s. values of data sets were changed
from How to repeate the process and store result in the new data frame pandas

No comments:
Post a Comment