Haystack PDFToTextConverter: getText() got an unexpected keyword argument 'textpage'

144 Views Asked by At

I tried the haystack beginner tutorial. It works fine. Now I try to use a local pdf on my PC instead of the articles from the Game of Thrones Wikipedia and I always get an error.

This is the code

from haystack.nodes import PDFToTextConverter
from pathlib import Path


def haystack():
    converter = PDFToTextConverter(
        remove_numeric_tables=True,
        valid_languages=["de"]
    )

    docs = converter.convert(file_path=Path("C:/Users/Franzi/Documents/myPDF.pdf"), meta=None)


if __name__ == '__main__':
    haystack()

Traceback (most recent call last):

File "C:\Users\Franzi\PycharmProjects\pythonProject2\main.py", line 15, in <module>
    haystack()
  File "C:\Users\Franzi\PycharmProjects\pythonProject2\main.py", line 11, in haystack
    docs = converter.convert(file_path=Path("C:/Users/Franzi/Documents/myPDF.pdf"), meta=None)
  File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\site-packages\haystack\nodes\file_converter\pdf.py", line 171, in convert
    pages = self._read_pdf(
  File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\site-packages\haystack\nodes\file_converter\pdf.py", line 301, in _read_pdf
    for page in results:
  File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\Franzi\AppData\Local\Programs\Python\Python38\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
TypeError: getText() got an unexpected keyword argument 'textpage'

I am using Python 3.8 and PyCharm 2023.2. I have tried different PDFs and also tried

from haystack.utils import convert_files_to_docs
convert_files_to_docs()

but it gives me the same error. Any ideas what I am doing wrong here?

0

There are 0 best solutions below