I'm running a PDF Scraper:
!pip install -q tabula-py==2.7.0
#[PDF Scraper]
try:
df = tabula.io.read_pdf(BytesIO(pdf_content), pandas_options={'header': None}, pages=3, stream=True)[0]
except Exception as e:
# If an IndexOutOfBoundsException occurs, indicating that page 3 is not found, try reading page 2 instead
df = tabula.io.read_pdf(BytesIO(pdf_content), pandas_options={'header': None}, pages=2, stream=True)[0]
#[PDF Scraper]
It's been running successfully for months, nothing has changed whatsoever and it suddenly failed with this error:
Error from tabula-java: Picked up JAVA_TOOL_OPTIONS: -Djdk.jar.maxSignatureFileSize=2147483639 Error: Error: End-of-File, expected line
Error from tabula-java: Picked up JAVA_TOOL_OPTIONS: -Djdk.jar.maxSignatureFileSize=2147483639 Error: Error: End-of-File, expected line
Code:This error has occurred for me when i have taken a corrupted pdf as below:
When provided non-corrupted pdf it worked as expected: