We are posting scanned and OCRed documents on a website and need to add a link to each page so that people who find the pages via a search engine easily get to the parent index of related documents.
I've been trying to work out a way to do this in Python using pypdf and have, so far, not had any luck. My strategy is for each document to create a page containing just the hyperlink (it needs to be different for each document) and then to merge that created page into each page in the document. I've messed around with code I've found in SE (mostly for PyPdf2, which I understand is deprecated, and which needs to be modified significantly for pypdf) without luck.
For example, from the pypdf doc pages, I tried:
from pypdf import PdfWriter, PdfReader
stamp = PdfReader("bg.pdf").pages[0]
writer = PdfWriter(clone_from="source.pdf")
for page in writer.pages:
page.merge_page(stamp, over=False) # here set to False for watermarking
writer.write("out.pdf")
and generated corrupt PDF files.
This works like a charm for me:
prerequesites:
code:
It should print: