I can't download the dataset from Huggingface

167 Views Asked by At
from datasets import load_dataset

dataset = load_dataset(path='seamew/THUCNewsTitle', split='train')

My network is ok, but the console always show:

Traceback (most recent call last):
  File "/Users/yuanyang_lee/Desktop/HuggingFace/demo2.py", line 11, in <module>
    dataset = load_dataset(path='seamew/THUCNewsTitle', split='train')
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/load.py", line 2153, in load_dataset
    builder_instance.download_and_prepare(
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/builder.py", line 954, in download_and_prepare
    self._download_and_prepare(
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/builder.py", line 1717, in _download_and_prepare
    super()._download_and_prepare(
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/builder.py", line 1027, in _download_and_prepare
    split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yuanyang_lee/.cache/huggingface/modules/datasets_modules/datasets/seamew--THUCNewsTitle/b3df30999854cbe65ae45110e895b2fa88c14975f2185c1f43d9b7ca85b5f679/THUCNewsTitle.py", line 30, in _split_generators
    train_path = dl_manager.download_and_extract(_TRAIN_DOWNLOAD_URL)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/download/download_manager.py", line 565, in download_and_extract
    return self.extract(self.download(url_or_urls))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/download/download_manager.py", line 428, in download
    downloaded_path_or_paths = map_nested(
                               ^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 456, in map_nested
    return function(data_struct)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/download/download_manager.py", line 454, in _download
    return cached_path(url_or_filename, download_config=download_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/utils/file_utils.py", line 182, in cached_path
    output_path = get_from_cache(
                  ^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/datasets/utils/file_utils.py", line 596, in get_from_cache
    raise FileNotFoundError(f"Couldn't find file at {url}")
FileNotFoundError: Couldn't find file at https://drive.google.com/u/0/uc?id=1xnicHROZsgtxKodf8sZiRiXoWJ7fpQt2&export=download

I tried to change the Wi-Fi connection but it does not work. And I can sucessfully open Huggingface on my browser.

1

There are 1 best solutions below

0
Eran H. On

Huggingface datasets can contain custom code that runs when you try to load the dataset. For example, the code for the dataset you provided is here. It seems that what this code does is try to download a file from a google drive link, which doesn't work. This can be caused by numerous reasons, such as authentication or that the file was deleted.

You can try and contact the maintainer of this dataset. One possible approach for contacting them is to open a discussion on the huggingface datasets project page. (in the link provided above, go to the community tab and press New discussion)

It's worth noting that some version of the dataset seems to exist in the repository in .arrow files. You can try and download it from there and write your own code to load the dataset.