I have a code as shown below. The HDF5 dataset/NetCDF variable is an array of size k*N where k is the number of sets and N is the number of elements (3-D points). Currently, I am reading it as a 1-D array and then splitting it into arrays to create an array of array. The code is as follows:
import ucar.nc2.dataset.NetcdfDatasets
val f = NetcdfDatasets.openFile(fname, null)
val ptList1D = f.findVariable("points").read.copyTo1DJavaArray.asInstanceOf[Array[Float]]
val ptList: Array[Array[Float]] = (0 until nSets).map(i => ptList1D.slice(i * 3, (i + 1) * 3)).toArray
The problem is that both k and N could be very large, hence this becomes a rate limiting step. I can speed up the second statement by replacing map with par.map, but is not going to improve performance much. Is there a smarter way of doing this?
The Netcdf-java when loading shows the following relevant information:
variables:
float points(sets=10000, points=50000, dim=3);
PS: nSets is read separately prior to this step. Seen here is nSets=10k with each set containing 50K points (dimension=3).