I need to extract images from a PDF document and their coordinates, and then paste this image into the new coordinates.
In borb, you can extract images as follows:
I: ImageExtraction = ImageExtraction()
# load
doc: typing.Optional[Document] = None
with open("test_image.pdf", "rb") as in_file_handle:
doc = PDF.loads(in_file_handle, [I])
# check whether we have read a Document
assert doc is not None
print(I.get_images())
At the output we get a dictionary with images:
{0: [<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=660x660 at 0x1A03AC5FEE0>]}
And how to get the coordinates of this image and its dimensions, I do not know.
In addition, in borb you can insert an image as follows:
# set a PageLayout
layout: PageLayout = SingleColumnLayout(page)
# add an Image
layout.add(
Image(
I.get_images()[0][0],
width=Decimal(237.72),
height=Decimal(237.72),
)
)
However, in this construction, you cannot select coordinates to insert an image. Perhaps you can choose some kind of PageLayout in which you can adjust its coordinates, but I do not know which one.
Disclaimer: I am the author of
borbImageExtractionis meant to extract images, not their meta-information (such as their coordinates).You would need to write your own implementation of
EventListenerin order to get the job done.You can check out how
ImageExtractionworks to get a better idea of how it works.Essentially, it listens to
ImageRenderEventobjects being processed by the PDF parser logic.ImageRenderEventhas a few methods that might be useful to you:get_x()get_y()get_width()get_height()So, by creating your own
EventListenerand processingImageRenderEventyou will be able to extract the images and their coordinates.