I am reading a ZARR file from a s3 bucket with xarray. I got to successfully filter by time and latitude/longitude:
def read_zarr(self, dataset: str, region: Region) -> Any:
# Read ZARR from s3 bucket
fs = s3fs.S3FileSystem(key="KEY", secret="SECRET")
mapper = fs.get_mapper(f"{self.S3_PATH}{dataset}")
zarr_ds = xr.open_zarr(mapper, decode_times=True)
# Filter by time
time_period = pd.date_range("2013-01-01", "2023-01-31")
zarr_ds = zarr_ds.sel(time=time_period)
# Filter by latitude/longitude
region_gdf = region.geo_data_frame
latitude_slice = slice(region_gdf.bounds.miny[0], region_gdf.bounds.maxy[0])
longitude_slice = slice(region_gdf.bounds.minx[0], region_gdf.bounds.maxx[0])
return zarr_ds.sel(latitude=latitude_slice, longitude=longitude_slice)
The problem is that this returns a rectangle of data (actually a cuboid, if we consider the time dimension). For geographical regions that are long and thin, this will represent a huge waste, as I will first download years of data, to then discard most of it. Example with California:
I would like to intersect the ZARR coordinates with the region ones. How can I achieve it?
from Filter xarray ZARR dataset with GeoDataFrame
No comments:
Post a Comment