I have an DataArray with two variables (meteorological data) over time,y,x coordinates. The x and y coordinates are in a projected coordinate system (EPSG:3035) and aligned so that each cell covers pretty much exactly a standard cell of the 1km LAEA reference grid
I want to prepare the data for further use in Pandas and/or database tables, so I want to add the LAEA Gridcell Number/Label which can be calculated from x and y directly via the following (pseudo) function
def func(cell):
return r'1km{}{}'.format(int(cell['y']/1000), int(cell['x']/1000)) # e.g. 1kmN2782E4850
But as far as I can see there seems to be no possibility, to apply this function to a DataArray or DataSet in a way so that I have access to these coordinate variables (at least .apply_ufunc()
wasn't really working for me.
I am able to calc this on Pandas later on, but some of my datasets consists of 60 up to 120 Mio. Cells/Rows/datasets and pandas (even with Numba) seems to have troubles with that amount. On the xarray I am able to process this on 32 Cores via Dask.
I would be grateful on any advice on how to get this working.
EDIT: Some more insights of the data I`m working with:
This one is quite the largest with 500 Mio cells, but I am able to downsample this to squarekilometer resolution which ends up with about 160 Mio. cells
If the dataset is small enough, I am able to export it as a pandas dataframe and calculate there, but thats slow and not very robust as the kernel is crashing quite often
from Calculate xarray dataarray from coordinate labels
No comments:
Post a Comment