Writing xarray datasets to AWS S3 takes a surprisingly big amount of time, even when no data is actually written with compute=False.
Here's an example:
import fsspec
import xarray as xr
x = xr.tutorial.open_dataset("rasm")
target = fsspec.get_mapper("s3://bucket/target.zarr")
task = x.to_zarr(target, compute=False)
Even without actually computing it, to_zarr takes around 6 seconds from an EC2 that's in the same region as the S3 bucket.
Looking at the debug logs, there seems to be quite a bit of redirecting going on, as the default region in aiobotocore is set to us-east-2 while the bucket is in eu-central-1.
If I first manually put the default region into the environment variables with
os.environ['AWS_DEFAULT_REGION'] = 'eu-central-1'
then the required time drops to around 3.5 seconds.
So my questions are:
-
Is there any way to pass the region to
fsspec(ors3fs)? I've tried addings3_additional_kwargs={"region":"eu-central-1"}to theget_mappermethod, but that didn't do anything. -
Is there any better way to interface with zarr on S3 from
xarraythan the above (withfsspec)?
versions:
xarray: 0.17.0
zarr: 2.6.1
fsspec: 0.8.4
from Zarr: improve xarray writing performance to S3
No comments:
Post a Comment