xarray.DataArray storage solutions

18 Views Asked by user180146 At 05 March 2024 at 13:19

I have a large multidimensional xarray.DataArray. It has 4 dimensions of which one is time.

Time is measured in seconds and for many of these times all values (in all dimensions) are zero. Since this data array quickly gets very large I was hoping to avoid storing all the zero's (to make it sparse). Below is how the dataarray is build

dataarray = xr.DataArray(data=data, coords=[time, coord1, coord2, coord3], dims=['time', 'coord1', 'coord2', 'coord3'])

data is a numpy array which is initialized as:

data= np.zeros((len(time), len(coord1), len(coord2), len(coord3))

I have found a solution that removes all the timesteps with only zeros (it seems to work in my preliminary tests) It does atleast decrease the memory size of the dataarray by 10 times. However it is extremely slow to the point that it is not workable because it would have to happen many times:

times_to_drop = [timestamp for timestamp in dataarray.time.values[2:(len(dataarray.time.values)-1)] if not np.any(dataarray.sel(time=timestamp).values)]
dataarray = dataarray.drop_sel(time=times_to_drop)

I am not dropping the first two and the last timestep on purpose so I can use them to infer the timestep, starttime and endtime

My question is. Can this be done faster (a lot faster) by either imrpoving my own solution or by employing a completely different one. I am building on existing software so I rather consider solutions that build on this xarray.dataArray implementation than to consider complete overhauls

Original Q&A

xarray.DataArray storage solutions

There are 0 best solutions below

Related Questions in NUMPY-NDARRAY

Related Questions in PYTHON-XARRAY

Trending Questions

Popular # Hahtags

Popular Questions