I am trying to use python tabula to extract a table from a PDF. I have use Tabula app, to generate a template. In the app, the output seems work as below:
I have used the area extracted from tabula template. Code below:
y1 = 77.0376069164276
x1 = 23.662381164550744
y2 = y1 + 732.1944360351562
x2 = x1 + 546.9135269165039
dfs2 = tabula.read_pdf(input_path="C:\\Users\\pedro\\Downloads\\BDI_00_20231228.pdf",
pandas_options={'header': None},
pages=[612],
guess=False,
area=[[y1,x1,y2,x2]])
dfs2[0]
See below the output, the columns seem to be mixed with some lines:


