How to draw a vertical line on a PDF in Python?

103 Views Asked by At

I am working on a project which requires parsing a PDF hosted online for relevant data. The pdf can be found here. I started by using tabula-py to parse the table.

However, because it is in a flat-file structure (repeating redundant data is instead left as a blank cell for readability) difficulties have presented themselves, such as data leaking to adjacent columns when being parsed.

I decided to try to post-process the PDF after downloading it, instead of just accessing it through the internet so that I could add vertical lines to the PDF. The vertical lines would be placed such that it created a lattice structure for the table, making it easier to parse and read.

I have tried to use the pypdf library to do this, as you can see below:

from pypdf import PdfWriter, PdfReader
from pypdf.annotations import Line

def addVerticalLines(inputPath, outputPath, xCoordinates):
    reader = PdfReader(inputPath)
    writer = PdfWriter()
    
    
    # Iterate through each page in the input PDF
    for i, page in enumerate(reader.pages):
        writer.add_page(page)

        for x in xCoordinates:
            #Create the line
            annotation = Line(
                rect=(x, 0, x + 1, page.mediabox[3]),
                p1=(x, 0),
                p2=(x, page.mediabox[3])
            )
            writer.add_annotation(page_number=i, annotation=annotation)
            
    # Write the modified PDF to the output file
    with open(outputPath, 'wb') as outputFile:
        writer.write(outputFile)

The result was an outputted pdf which looked the exact same as the downloaded PDF. On top of that, I got warnings in the console from the pypdf library saying:

Incorrect first char in NameObject:(None)

The number of times that the warnings appear is dependent on the number of pages in the pdf and the number of annotations. The warnings appear when the new PDF is saved to disk.

Is this a known issue? Is there a different library I should use instead? I would appreciate any assistance in drawing vertical lines on a local PDF. Thank you!

2

There are 2 best solutions below

0
Jorj McKie On BEST ANSWER

Here is a PyMuPDF solution.

import fitz  # PyMuPDF
doc = fitz.open("input.pdf")
page = doc[pno]  # pno = page number (0-based int)
p1 = fitz.Point(100, 100)  # starting point
p2 = fitz.Point(100, 300)  # ending point
page.draw_line(p1, p2, color=fitz.pdfcolor["red"], width=2)
doc.save("output.pdf")

Draws a vertical line. Not all possible parameters shown, like dashing pattern, putting in background / foreground or Optional Content visibility options.

Note: I am a maintainer and the original creator of PyMuPDF.

1
Br_76 On

Try:

from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.pdf import PageObject

def addVerticalLines(inputPath, outputPath, xCoordinates):
    # Create a PDF writer object
    writer = PdfFileWriter()

    # Iterate through each page in the input PDF
    with open(inputPath, 'rb') as inputFile:
        reader = PdfFileReader(inputFile)
        
        for i in range(reader.getNumPages()):
            page = reader.getPage(i)
            
            # Add the original page to the writer
            writer.addPage(page)

            # Add vertical lines to the page
            for x in xCoordinates:
                page.mergePage(PageObject.createLine((x, 0), (x, page.mediaBox[3])))

    # Write the modified PDF to the output file
    with open(outputPath, 'wb') as outputFile:
        writer.write(outputFile)

# Example usage
inputPath = "path/to/your/input.pdf"
outputPath = "path/to/your/output.pdf"
xCoordinates = [100, 200, 300]  # Add the desired x-coordinates for vertical lines
addVerticalLines(inputPath, outputPath, xCoordinates)