I have a dataframe, which has two columns (review and sentiment). I am using pytorch and torchtext library for preprocessing data. Is it possible to use dataframe as source to read data from, in torchtext? I am looking for something similar to, but not
data.TabularDataset.splits(path='./data')
I have performed some operation (clean, change to required format) on data and final data is in a dataframe.
If not torchtext, what other package would you suggest that would help in preprocessing text data present in a datarame. I could not find anything online. Any help would be great.
Thanks Geoffrey.
From looking at the source code for torchtext.data.field
https://pytorch.org/text/_modules/torchtext/data/field.html
It looks like the 'train' parameter needs to be either a Dataset already, or some iterable source of text data. But given we haven't created a dataset at this point I am guessing you have passed in just the column of text from the dataframe.