I have a list of one-dimensional arrays with different lengths and I wish to calculate the mean along the different arrays. Thus this "tolerant" mean contains different numbers of samples in each variable.
To do this I made a new (n,m) array (with n the number of one-dimensional arrays and m the maximal array length) by concatenating the one-dimensional arrays to zeros such that they have the same length. Then I used numpy ma array to mask the zeros from the mean calculation.
Defining the numpy ma array seems to be fine, but when I run np.mean and np.std the program crashes (kernel dies in Jupyter notebook). I suspect this is related with memory issues because the array is large (1964035, 2574). But I was surprised to see np.mean crashing although no crash happens when defining the array.
Here is my code to define the masked array
import numpy as np
myArrays = [np.random.randn(np.random.randint(2000)) for i in range(1000000)] # only for this example
maxLen = max([myArray.shape[0] for myArray in myArrays])
myMasked = np.ma.array([np.concatenate([myArray, np.zeros(maxLen-myArray.shape[0])]) \
for myArray in myArrays] , \
mask=[np.concatenate([[False for i in range(myArray.shape[0])], \
[True for i in range(maxLen-myArray.shape[0])]]) \
for myArray in myArrays])
Then running np.mean makes the program crash
myMean = np.mean(myMasked, axis=0)