scrapy-playwright runs but get empty json file

75 Views Asked by Steven Greenbaum At 17 August 2023 at 12:13

when I crawl this site. empty json file is created. I am trying to follow along in a Udemy course and have asked about this there but have not received a reply. I have checked my code but don't see anything obvious. I was initially doing this in on PC in windows 11 but found info in scrapy-playwrght documentation that says that it doesn't work in windows and recommended Linux. I switched over to a MacBook with Ventura and get only 1 error as opposed to many on PC, but still getting empty json file. I don't know what to check. I have shown below: 1.my code 2 the end of terminal output 3. lines from my settings.py

scrapy crawl gdp -O gdp.json

import scrapy
from scrapy_playwright.page import PageMethod
 
 
class PositionsSpider(scrapy.Spider):
    name = "positions"
    allowed_domains = ["traf.com"]
    start_urls = ["https://careers.trafigura.com/TrafiguraCareerSite/search"]
 
    def start_requests(self):
        yield scrapy.Request(
            self.start_urls[0],
            meta=dict(
                playwright=True,
                playwright_page_methods = [
                    PageMethod("wait_for_selector", 'section#results div[role="list"]')
                ]
            )
        )
 
    async def parse(self, response):
        for job in response.css('section#results div[role="list"] div[role="listitem"]'):
            yield {
                'title': job.css('a::text').get()
            }

FRom settings.py

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler"
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

2023-08-16 10:13:49 [scrapy.core.scraper] ERROR: Error downloading <GET https://careers.trafigura.com/TrafiguraCareerSite/search>
Traceback (most recent call last):
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/twisted/internet/defer.py", line 1693, in _inlineCallbacks
    result = context.run(
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/twisted/python/failure.py", line 518, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/scrapy/core/downloader/middleware.py", line 52, in process_request
    return (yield download_func(request=request, spider=spider))
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/twisted/internet/defer.py", line 1065, in adapt
    extracted = result.result()
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 297, in _download_request
    result = await self._download_request_with_page(request, page, spider)
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/scrapy_playwright/handler.py", line 359, in _download_request_with_page
    server_addr = await response.server_addr()
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/playwright/async_api/_generated.py", line 559, in server_addr
    return mapping.from_impl_nullable(await self._impl_obj.server_addr())
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/playwright/_impl/_network.py", line 538, in server_addr
    return await self._channel.send("serverAddr")
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 482, in wrap_api_call
    return await cb()
  File "/Users/Steve/PythonProjects/scrapy-with-js-2/venv/lib/python3.11/site-packages/playwright/_impl/_connection.py", line 97, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Target page, context or browser has been closed

Original Q&A

scrapy-playwright runs but get empty json file

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in SCRAPY-PLAYWRIGHT

Trending Questions

Popular # Hahtags

Popular Questions