Wednesday, 6 November 2019

How to repeate the process and store result in the new data frame pandas

I have 2 datasets border and df.

PART 1:

df = 

     id_easy    ordinal latitude longitude      epoch   day_of_week
0   e35f652a         68  22.1111    7.2222 1465084811   Sunday
1   e35f652a         69  22.1111    7.2222 1465084870   Sunday
2   e35f652a         70  22.1111    7.2222 1465084930   Sunday
3   e35f652a         71  22.1111    7.2222 1465084990   Sunday
4   e35f652a         72  22.1111    7.2222 1465085050   Sunday

turin = df.loc[df['ordinal'] == 1]

crs = {'init':'epsg:4326'}
geometry = [Point(xy) for xy in zip(turin.longitude,turin.latitude)]
turin_point = gpd.GeoDataFrame(turin,crs=crs,geometry=geometry) #to get geometry

PART 2:

border.shape = (931, 674) first number in column names shows the zone name. For example, in 12_longitude_1 = zone 12, longitude, 1-st. I have random zones as you can see (12,14,23... and so on)

Here is sample data frame:

border = 

12_longitude_1  12_latitude_1   14_longitude_2  14_latitude_2   23_longitude_3  23_latitude_3
            11             12               13             14               15            16
            11             12               13             14               15            16
            11             12               13             14               15            16

FINAL PART:

I want to check turin_point within the zone 12. I am doing the following operation with first 2 columns:

Code for 12_longitude_1,12_latitude_1:

border = border[['longitude_1','latitude_1']].dropna()
border.longitude_1 = border.longitude_1.replace(r'[()]', '', regex=True)
border.latitude_1 = border.latitude_1.replace(r'[()]', '', regex=True)
border.longitude_1 = pd.to_numeric(border.longitude_1, errors='coerce')
border.latitude_1 = pd.to_numeric(border.latitude_1, errors='coerce')
geometry2 = [Point(xy) for xy in zip(border.longitude_1,border.latitude_1)]
border_point = gpd.GeoDataFrame(border,crs=crs,geometry=geometry2)
turin_final = Polygon([[p.x, p.y] for p in border_point.geometry])
within_turin = turin_point[turin_point.geometry.within(turin_final)]
long_lat_1 = len(within_turin)

Finally long_lat_12 gives me 1697


I want to automate this process for the whole dataset (for all column couples)?


Desired output:

enter image description here

Libraries to use:

import numpy as np
import pandas as pd

import geopandas as gpd
from shapely.geometry import Point, Polygon

TRY:

pd_out = pd.DataFrame({'zone': [], 'number': []})

for col_num in range(0, len(border.columns)-1, 2):
    curr_lon_name = border.columns[col_num]
    curr_lat_name = border.columns[col_num + 1]
    num = curr_lon_name.split("_")[-1]
    border = border[[curr_lon_name, curr_lat_name]].dropna()
    border[curr_lon_name] = border[curr_lon_name].replace(r'[()]', '', regex=True)
    border[curr_lat_name] = border[curr_lat_name].replace(r'[()]', '', regex=True)
    border[curr_lon_name] = pd.to_numeric(border[curr_lon_name], errors='coerce')
    border[curr_lat_name] = pd.to_numeric(border[curr_lat_name], errors='coerce')
    geometry2 = [Point(xy) for xy in zip(border[curr_lon_name],border[curr_lat_name])]
    border_point = gpd.GeoDataFrame(border,crs=crs,geometry=geometry2)
    turin_final = Polygon([[p.x, p.y] for p in border_point.geometry])
    within_turin = turin_point[turin_point.geometry.within(turin_final)]
    curr_len = len(within_turin)
    pd_out = pd_out.append({'zone': "long_lat_{}".format(num), 'number': curr_len}, ignore_index=True)

Gives me only 1 row:

    zone         number
0   long_lat_1  1697.0

I want all rows and names as indicated in the photo

p.s. values of data sets were changed



from How to repeate the process and store result in the new data frame pandas

No comments:

Post a Comment