I'm learning python web scraping . It shows AttributeError when i scrapy crawl a spider

2.2k Views Asked by At

I'm learning python scraping with scrapy. I did exacly the same thing as the tutorial teaches. But I got an error. Please help!

My Python code:

import scrapy


class BookSpider(scrapy.Spider):
    name = "books"
    allowed_domains = ["books.toscrape.com"]
    start_urls = ["https://books.toscrape.com"]

    def parse(self, response):
        books = response.css("article.product_pod")
                             
        for book in books:
            yield{
                "name":book.css("h3 a::text").get(),
                "price":book.css(".product_price .price_color::text").get(),
                "url": book.css("h3 a").attrib["href"],
            }

The terminal shows

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Administrator\python\venv\bookscraper\Scripts\scrapy.exe\__main__.py", line 7, in <module>
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 161, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 114, in _run_print_help
    func(*a, **kw)
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 169, in _run_command
    cmd.run(args, opts)
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\commands\crawl.py", line 30, in run
    self.crawler_process.start()
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\crawler.py", line 390, in start
    install_shutdown_handlers(self._signal_shutdown)
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\utils\ossignal.py", line 19, in install_shutdown_handlers    reactor._handleSignals()
    ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'AsyncioSelectorReactor' object has no attribute '_handleSignals'

The ossignal.py file:

import signal

signal_names = {}
for signame in dir(signal):
    if signame.startswith("SIG") and not signame.startswith("SIG_"):
        signum = getattr(signal, signame)
        if isinstance(signum, int):
            signal_names[signum] = signame


def install_shutdown_handlers(function, override_sigint=True):
    """Install the given function as a signal handler for all common shutdown
    signals (such as SIGINT, SIGTERM, etc). If override_sigint is ``False`` the
    SIGINT handler won't be install if there is already a handler in place
    (e.g.  Pdb)
    """
    from twisted.internet import reactor

    reactor._handleSignals()
    signal.signal(signal.SIGTERM, function)
    if signal.getsignal(signal.SIGINT) == signal.default_int_handler or override_sigint:
        signal.signal(signal.SIGINT, function)
    # Catch Ctrl-Break in windows
    if hasattr(signal, "SIGBREAK"):
        signal.signal(signal.SIGBREAK, function)
1

There are 1 best solutions below

2
Builditluc On BEST ANSWER

As pointed out in my comment, the issue you are describing is already being tackled by scrapy here and has to do with one of its dependencies, twisted (a day ago, a new version was released, 23.8.0, which seems to cause the issue).

Another user fixed the issue by installing a previous version of twisted (see here).

Basically, he installed the following version of twisted, which fixed his issue.

pip install Twisted==22.10.0

Until the issue is fixed and a new version is released, I suggest using the previous version.