Wednesday, 2 October 2019

Nested dict from for loop adding same values to all nested keys

I have address data and shapefiles with polygons, and am trying to determine the closest distance (in miles) of each address from each polygon, then create a nested dict containing all the info, with this format:

nested_dict = {poly_1: {address1: distance, address2 : distance}, 
               poly2: {address1: distance, address2: distance}, etc}

The full, applicable code I'm using is:

import pandas as pd
from shapely.geometry import mapping, Polygon, LinearRing, Point
import geopandas as gpd
from math import radians, cos, sin, asin, sqrt

address_dict = {k: [] for k in addresses_geo.input_string}
sludge_dtc = {k: [] for k in sf_geo.unique_name}

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a))
    r = 3956 # Radius of earth in miles. Use 6371 for kilometers
    return c * r

# Here's the key loop that isn't working correctly
for unique_name, i in zip(sf_geo.unique_name, sf_geo.index):
    for address, pt in zip(addresses_geo.input_string, addresses_geo.index):
        pol_ext = LinearRing(sf_geo.iloc[i].geometry.exterior.coords)
        d = pol_ext.project(addresses_geo.iloc[pt].geometry)
        p = pol_ext.interpolate(d)
        closest_point_coords = list(p.coords)[0]
        # print(closest_point_coords)
        dist = haversine(addresses_geo.iloc[pt].geometry.x,
                         addresses_geo.iloc[pt].geometry.y,
                         closest_point_coords[0], closest_point_coords[1])
        address_dict[address] = dist
    sludge_dtc[unique_name] = address_dict
# Test results on a single address
addresses_with_sludge_distance = pd.DataFrame(sludge_dtc)
print(addresses_with_sludge_distance.iloc[[1]].T)

If I break this code out and try and calculate the distances for a single polygon, it seems to work fine. However, when I create the DataFrame and check an address, it lists the same distance for every single polygon.

So, inner-dict-key '123 Main Street' will have 5.25 miles for each of the polygon keys in the outer dict, and '456 South Street' will have 6.13 miles for each of the polygon keys in the outer dict. (Made up examples.)

I realize I must be doing something dumb in the way I have the for loops set up, but I can't figure it out. I've reversed the order of the for statements, messed with indents-- all the same result.

To make it clear, what I want to happen is:

  • Take a single polygon, then
  • For each address in the address data, find the distance from that polygon and add to the address_dict dictionary with the address as the key and the distance as the value
  • When all addresses have been calculated, add the entire address dict as the value for the polygon key in sludge_dtc
  • Move on to the next polygon and continue

Any ideas what I'm missing?



from Nested dict from for loop adding same values to all nested keys

No comments:

Post a Comment