Efficiently convert numpy matrix to Vaex DataFrame

238 Views Asked by At

I'm trying to turn my wide (100K+ columns) 2D numpy data into a Vaex Dataframe. I'm reading through the documentation, and I see two relevant functions:

from_items

from_arrays

but both give me an entire column x, where each row is a numpy array. What I expected was for Vaex to intelligently recognize that I want each column of data from the numpy array to be its own separate column in the Vaex DataFrame.

vaex.from_arrays(x=2d_numpy_matrix) gives me:

x
---
0 np.array(1,2,3)
1 np.array(4,5,6)

when I wanted:

0 | 1 | 2 (Column header)
---
1 | 2 | 3
4 | 5 | 6

My workaround is vaex.from_pandas(pd.DataFrame(2d_numpy_matrix)) but this is embarrassingly slow. Is there a more CPU-time efficient way to do this?

1

There are 1 best solutions below

2
Rupert On BEST ANSWER

You can unpack a dictionary comprehension like this:

import numpy as np
import vaex

headers = np.array(['1','2','3'])
data = np.array([[1,4],[2,5],[3,6]])

df = vaex.from_arrays(**{header: column for header, column in zip(headers, data)})

This yields:

>>> df
#    0    1    2
0    1    2    3
1    4    5    6