How to optimize a big 4D array containing "nan" values

49 Views Asked by At

I have an big array data_neighbors with shape=(1, 3, 1000000, 112) containing a lot of nan values.

array([[[[ 88.769226,  80.62714 ,  75.95856 ]],

    [[ 88.749695,  79.52362 ,  76.456604]],

    [[ 89.07196 ,  82.84393 ,  77.12067 ]],

    ...,

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]]],


   [[[ 88.769226,  80.62714 ,  75.95856 ]],

    [[ 88.749695,  79.52362 ,  76.456604]],

    [[ 89.07196 ,  82.84393 ,  77.12067 ]],

    ...,

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]]],


   [[[ 88.769226,  80.62714 ,  75.95856 ]],

    [[ 88.749695,  79.52362 ,  76.456604]],

    [[ 89.07196 ,  82.84393 ,  77.12067 ]],

    ...,

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]]],


   ...,


   [[[116.88446 , 119.25018 , 125.77301 ]],

    [[117.02118 , 118.58612 , 124.601135]],

    [[116.82587 , 118.84979 , 125.46051 ]],

    ...,

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]]],


   [[[117.02118 , 118.58612 , 124.601135]],

    [[116.98212 , 119.34784 , 125.89996 ]],

    [[116.91376 , 118.957214, 125.606995]],

    ...,

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]]],


   [[[117.099304, 119.45526 , 126.03668 ]],

    [[117.10907 , 118.81073 , 125.2359  ]],

    [[117.030945, 119.09393 , 125.79254 ]],

    ...,

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]],

    [[       nan,        nan,        nan]]]], dtype=float32)

How can I remove all nan values in this array to improve the memory use? It's important to note that the number of nan values changes in the last dimension. For example: data_neighbors[0,0,0].shape=3 and data_neighbors[0,0,1].shape=112. So, it will be impossible to result an array. Maybe lists in array?

EDIT : The main objective of the script is to realize multi-point regridding. For each point of a grid A, I assign x values ​​of another grid B within a radius of x kilometers around the point. The x values ​​are determined by ind_regrid (1000000*112) a variable which contains for each index of A, the different points of B to integrate. Depending on the index of A, ind_regrid potentially contains nan values ​​in the 112 potential indices to regrid.

nc_conf       = Dataset(fic_regril, 'r')
print('-> Read regrid file '+str(fic_regril))
ind_regrid    = nc_conf.variables['inds_regrid'][:]
nc_conf.close()
masked_indices = np.ma.getmaskarray(ind_regrid)
data_neighbors = data[:,:,:,np.where(~masked_indices,ind_regrid,0)]
data_neighbors[masked_indices] = np.nan
data_neighbors_list.append(data_neighbors) #pt, regrid, param, time
0

There are 0 best solutions below