from datasets import load_dataset
dataset = load_dataset(path='seamew/THUCNewsTitle', split='train')
My network is ok, but the console always show:
Traceback (most recent call last):
File "/Users/yuanyang_lee/Desktop/HuggingFace/demo2.py", line 11, in <module>
dataset = load_dataset(path='seamew/THUCNewsTitle', split='train')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/datasets/load.py", line 2153, in load_dataset
builder_instance.download_and_prepare(
File "/opt/homebrew/lib/python3.11/site-packages/datasets/builder.py", line 954, in download_and_prepare
self._download_and_prepare(
File "/opt/homebrew/lib/python3.11/site-packages/datasets/builder.py", line 1717, in _download_and_prepare
super()._download_and_prepare(
File "/opt/homebrew/lib/python3.11/site-packages/datasets/builder.py", line 1027, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yuanyang_lee/.cache/huggingface/modules/datasets_modules/datasets/seamew--THUCNewsTitle/b3df30999854cbe65ae45110e895b2fa88c14975f2185c1f43d9b7ca85b5f679/THUCNewsTitle.py", line 30, in _split_generators
train_path = dl_manager.download_and_extract(_TRAIN_DOWNLOAD_URL)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/datasets/download/download_manager.py", line 565, in download_and_extract
return self.extract(self.download(url_or_urls))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/datasets/download/download_manager.py", line 428, in download
downloaded_path_or_paths = map_nested(
^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 456, in map_nested
return function(data_struct)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/datasets/download/download_manager.py", line 454, in _download
return cached_path(url_or_filename, download_config=download_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/datasets/utils/file_utils.py", line 182, in cached_path
output_path = get_from_cache(
^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/datasets/utils/file_utils.py", line 596, in get_from_cache
raise FileNotFoundError(f"Couldn't find file at {url}")
FileNotFoundError: Couldn't find file at https://drive.google.com/u/0/uc?id=1xnicHROZsgtxKodf8sZiRiXoWJ7fpQt2&export=download
I tried to change the Wi-Fi connection but it does not work. And I can sucessfully open Huggingface on my browser.
Huggingface datasets can contain custom code that runs when you try to load the dataset. For example, the code for the dataset you provided is here. It seems that what this code does is try to download a file from a google drive link, which doesn't work. This can be caused by numerous reasons, such as authentication or that the file was deleted.
You can try and contact the maintainer of this dataset. One possible approach for contacting them is to open a discussion on the huggingface datasets project page. (in the link provided above, go to the community tab and press
New discussion)It's worth noting that some version of the dataset seems to exist in the repository in
.arrowfiles. You can try and download it from there and write your own code to load the dataset.