(Huggingface) Using fine tuned mode for inference over a dataset

35 Views Asked by At

I am very new to this so I apologise if this is a dumb question, but for some reason, it seems to be pretty hard for me to just use an already trained transformer model to make inference.

So basically, what I am trying to do is to apply ClimateBERT to a dataset which I got from another author (ECOLEX_Legislation.csv).

I have encountered quite a few issues in the process of doing this, but managed to somewhat solve them (I think). However, as of now, after executing the code below, it has been running for a few hours. I am not too sure why. So I would appreciate it if someone could help me with this.

from transformers import pipeline, AutoTokenizer
from datasets import load_dataset
from transformers.pipelines.pt_utils import KeyDataset
df=load_dataset("csv",data_files="ECOLEX_Legislation.csv", delimiter=",", split="train")
for out in pipe(KeyDataset(df, "Policy_Content"), truncation=True, max_length=512):
    print(out)

Note that, I think that the model works fine if I used pipe it over one sentence. However, I think it becomes a problem when I pipe it over an entire dataset. Secondary to this, I was wondering if there are any complete tutorials that I could refer to (from loading dataset to analysing and visualising the data). Lastly, just to note, I am not trying to train the model or anything; I am using an already-made model and, from it, make inference.

Thank you so much for reading this and if you guys need more information, I am happy to elaborate on it.

Warm regards, Yanith

I have looked through tutorials in various websites. However, they only pertain to basic inference (i.e., inference with a sentence only). So for example,

from transformers import pipeline

pipe = pipeline("text-classification", model="climatebert/distilroberta-base-climate-sentiment")

pipe("Climate mitigation is an opportunity.")

I think it becomes exponentially difficult when loading in a dataset (due to various unknown reasons).

0

There are 0 best solutions below