Kaggle's packages missing many essential methods, for instance Kaggle's `Dataset` class has no `from_generator()` method

23 Views Asked by At

I have been working on a particular NLP project for a month and have been running into error after error. I built a small model on my potato PC and it works perfectly. I upscaled it to Kaggle and ran into multiple errors, which have frustrated the hell out of me! I've been exploring all individual packages, and lo and behold, many methods are missing from Kaggle packages!

A perfect example is the Dataset class from the datasets package: the one in my PC and on Kaggle are version 2.17.1

But the class in Kaggle is missing so many essential methods, such as from_generator()! You can see for yourself, just install the datasets package, then do the following on your local machine and on Kaggle and note the differences:

from datasets import Dataset 
dir(Dataset)

This is what led to most of my errors. How and why is this happening? Is there a way to enable all the essential methods on Kaggle, like from_generator()?

1

There are 1 best solutions below

0
Luc On

I had the same issue and maybe someone else will, so this is how to fix it:

  1. Run one of these commands

    !pip install datasets
    #might give you an older one
    
    !pip install datasets==2.18.0
    #currently the newest, check PyPi for newest version
    
    !pip install git+https://github.com/huggingface/datasets.git
    #use the dev version (might have bugs, but is def the newest)
    
  2. Do RELOAD the kernel in Kaggel, i am not sure why but it is necessary for this library.

  3. Check the versions

    import datasets; print(datasets.__version__)