Concatenating Dataframes and if there is an 'in place' TfidfVectorizer

21 Views Asked by At

I have been given the task of making a sentiment prediction model based on movie reviews. The train dataset along with the feature movie_review contains other features such as movie_name, release_date etc. and the sentiment (positive or negative).

I have vectorized the the feature movie_review using the TfidfVectorizer() function of sklearn. Now I am trying to concatenate two dataframes :-

  1. The train dataset which is of shape (156311, 5)
  2. The dataframe which I got as the output of TfidfVectorizer() after vectorizing the movie_review feature column of the train dataset. Shape of this dataframe is (156311 × 65220)

In order to concatenate the two dataframes, I use the following function,

pd.concat([train, review_vectorized], axis=1)

The problem is that every time I try to run time the function, the RAM memory runs out and the google collab crashes.

Is there a more efficient way of concatenating dataframes? Or even better, is there a way to vectorize the textual column 'in-place'? So as we wouldn't need to create a separate dataframe with the vectorized text and the concatenate with the original dataframe?

0

There are 0 best solutions below