HDF5/NetCDF: Reading a Java array in parallel using Scala

35 Views Asked by At

I have a code as shown below. The HDF5 dataset/NetCDF variable is an array of size k*N where k is the number of sets and N is the number of elements (3-D points). Currently, I am reading it as a 1-D array and then splitting it into arrays to create an array of array. The code is as follows:

import ucar.nc2.dataset.NetcdfDatasets
val f = NetcdfDatasets.openFile(fname, null)

val ptList1D = f.findVariable("points").read.copyTo1DJavaArray.asInstanceOf[Array[Float]]
val ptList: Array[Array[Float]] = (0 until nSets).map(i => ptList1D.slice(i * 3, (i + 1) * 3)).toArray

The problem is that both k and N could be very large, hence this becomes a rate limiting step. I can speed up the second statement by replacing map with par.map, but is not going to improve performance much. Is there a smarter way of doing this?

The Netcdf-java when loading shows the following relevant information:

  variables:
    float points(sets=10000, points=50000, dim=3);

PS: nSets is read separately prior to this step. Seen here is nSets=10k with each set containing 50K points (dimension=3).

0

There are 0 best solutions below