Sunday, 24 September 2023

Filter xarray ZARR dataset with GeoDataFrame

I am reading a ZARR file from a s3 bucket with xarray. I got to successfully filter by time and latitude/longitude:

    def read_zarr(self, dataset: str, region: Region) -> Any:
        # Read ZARR from s3 bucket
        fs = s3fs.S3FileSystem(key="KEY", secret="SECRET")
        mapper = fs.get_mapper(f"{self.S3_PATH}{dataset}")
        zarr_ds = xr.open_zarr(mapper, decode_times=True)

        # Filter by time
        time_period = pd.date_range("2013-01-01", "2023-01-31")
        zarr_ds = zarr_ds.sel(time=time_period)

        # Filter by latitude/longitude
        region_gdf = region.geo_data_frame
        latitude_slice = slice(region_gdf.bounds.miny[0], region_gdf.bounds.maxy[0])
        longitude_slice = slice(region_gdf.bounds.minx[0], region_gdf.bounds.maxx[0])
        return zarr_ds.sel(latitude=latitude_slice, longitude=longitude_slice)

The problem is that this returns a rectangle of data (actually a cuboid, if we consider the time dimension). For geographical regions that are long and thin, this will represent a huge waste, as I will first download years of data, to then discard most of it. Example with California:

enter image description here

I would like to intersect the ZARR coordinates with the region ones. How can I achieve it?



from Filter xarray ZARR dataset with GeoDataFrame

No comments:

Post a Comment