Why we use return_tensors = "pt" during tokenization?

140 Views Asked by MSY At 03 March 2024 at 04:53

So I am doing tokenization of my dataset, and created one function,

max_length = 1026

def generate_and_tokenize_prompt(prompt):
    result = tokenizer(
        prompt,
        return_tensors="pt",
        truncation=True,
        max_length=max_length,
        padding="max_length",
    )
    return result

train_dataset = df_train['prompt']
val_dataset = df_test['prompt']
tokenized_train_dataset = train_dataset.map(generate_and_tokenize_prompt)
tokenized_val_dataset = val_dataset.map(generate_and_tokenize_prompt)

Here you can see we are using return_tensors="pt", but I am not sure why are using it. Because even without this parameters, I am able to tokenize my dataset.

Original Q&A

There are 1 best solutions below

Dtoc On 24 March 2024 at 13:09

"pt" means return pytorch tensor. See documentation https://huggingface.co/docs/transformers/main_classes/tokenizer

Why we use return_tensors = "pt" during tokenization?

There are 1 best solutions below

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in HUGGINGFACE

Related Questions in HUGGINGFACE-TOKENIZERS

Trending Questions

Popular # Hahtags

Popular Questions