Example.fromJSON(data.read(), fields)

228 Views Asked by At
from joblib import Parallel, delayed
from collections import OrderedDict
from torchtext.data import  Dataset, Example, RawField, Field, NestedField

self.raw_content = RawField()
self.id = RawField()
self.raw_abstract = RawField(is_target=True)
self.content = NestedField(Field(fix_length=80), fix_length=50)
self.abstract = NestedField(Field())
self.abstract.is_target = True

self.fields = { "article": [("raw_content", self.raw_content) ("content", self.content)],
            "abstract": [ ("raw_abstract", self.raw_abstract)("abstract", self.abstract),],
            "id": [("id", self.id)]}


def load_fname(fname, reading_path, fields):
    fpath = os.path.join(reading_path, fname)
    with open(fpath, "r") as data:
        ex = Example.fromJSON(data.read(), fields)
    return (ex, fpath)

What is the equivalent of Example.fromJSON(data.read(), fields), but with huggingface (https://github.com/huggingface)? I need to change some lstm in a machine learning model by some transformers. Now, the way to go is to preprocess the data using transformers.

EDIT

>>> from datasets import load_dataset
>>> dataset = load_dataset('json', data_files='my_file.json', field='data')

source : https://huggingface.co/docs/datasets/loading_datasets.html

I think I will have to use the above code, but still not sure.

0

There are 0 best solutions below