I'm getting "Couldn't cast because column names don't match" error while I was trying to create a dataset using the datasets package

25 Views Asked by Chaitanya S At 13 March 2024 at 04:00

The above image shows the structure of my data.

from sklearn.model_selection import train_test_split
from datasets import Features, ClassLabel, Value, Dataset, DatasetDict

df_train, df_tmp = train_test_split(
        movie_df,stratify=movie_df["label"], test_size=0.2)

df_val, df_test = train_test_split(
        df_tmp,stratify=df_tmp["label"], test_size=0.5)

ds_features = Features({"text": Value("string"), "label": ClassLabel(names=labels)})

dataset = DatasetDict({
    "train": Dataset.from_pandas(df_train.reset_index(drop=True),features=ds_features),
    "valid": Dataset.from_pandas(df_val.reset_index(drop=True),features=ds_features),
    "test": Dataset.from_pandas(df_test.reset_index(drop=True),features=ds_features)})

dataset

this code gave me a value error as shown:

error

I was expecting something similar to this but not with the same values:

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 13267
    })
    valid: Dataset({
        features: ['text', 'label'],
        num_rows: 1658
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1659
    })
})

Can anyone tell me what I am doing wrong?

Original Q&A

I'm getting "Couldn't cast because column names don't match" error while I was trying to create a dataset using the datasets package

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in DATASET

Related Questions in PREDICTION

Related Questions in VALUEERROR

Trending Questions

Popular # Hahtags

Popular Questions