How can I read a sparse matrix that I have saved with Python as a *.npz file in R? I already came across two answers* on Stackoverflow but neither seems to do the job in my case.
The data set was created with Python from a Pandas data frame via:
scipy.sparse.save_npz(
"data.npz",
scipy.sparse.csr_matrix(DataFrame.values)
)
It seems like the first steps for importing the data set in R are as follows.
library(reticulate)
np = import("numpy")
npz1 <- np$load("data.npz")
However, this does not yield a data frame yet.
I cannot access your dataset, so I can only speak from experience. When I try loading a sparse CSR matrix with numpy, it does not work ; the class of the object is
numpy.lib.npyio.NpzFile, which I can't use in R.The way I found to import the matrix into an R object, as has been said in a post you've linked, is to use scipy.sparse.
csr_matrix, which was a
scipy.sparse.csr_matrixobject in Python (Compressed Sparse Row matrix), is automatically converted into adgRMatrixfrom the R packageMatrix. Note that if you had usedscipy.sparse.csc_matrixin Python, you would get adgCMatrix(Compressed Sparse Column matrix). The actual function doing the hardwork converting the Python object into something R can use ispy_to_r.scipy.sparse.csr.csr_matrix, from thereticulatepackage.If you want to convert the
dgRMatrixinto a data frame, you can simply usealthough this might not be the best thing to do memory-wise if your dataset is big.
I hope this helped!