Wednesday, 31 March 2021

Calculate xarray dataarray from coordinate labels

I have an DataArray with two variables (meteorological data) over time,y,x coordinates. The x and y coordinates are in a projected coordinate system (EPSG:3035) and aligned so that each cell covers pretty much exactly a standard cell of the 1km LAEA reference grid

I want to prepare the data for further use in Pandas and/or database tables, so I want to add the LAEA Gridcell Number/Label which can be calculated from x and y directly via the following (pseudo) function

def func(cell):
    return r'1km{}{}'.format(int(cell['y']/1000), int(cell['x']/1000))      # e.g. 1kmN2782E4850

But as far as I can see there seems to be no possibility, to apply this function to a DataArray or DataSet in a way so that I have access to these coordinate variables (at least .apply_ufunc() wasn't really working for me.

I am able to calc this on Pandas later on, but some of my datasets consists of 60 up to 120 Mio. Cells/Rows/datasets and pandas (even with Numba) seems to have troubles with that amount. On the xarray I am able to process this on 32 Cores via Dask.

I would be grateful on any advice on how to get this working.

EDIT: Some more insights of the data I`m working with:

This one is quite the largest with 500 Mio cells, but I am able to downsample this to squarekilometer resolution which ends up with about 160 Mio. cells

Xarray "vci" with monthly Vegetation Condition Indices over some decades

If the dataset is small enough, I am able to export it as a pandas dataframe and calculate there, but thats slow and not very robust as the kernel is crashing quite often

same calc in pandas



from Calculate xarray dataarray from coordinate labels

No comments:

Post a Comment