How can I use borb to extract an image from PDF and its coordinates on a page, and then paste it into the desired coordinates?

44 Views Asked by Vladislav At 24 December 2023 at 14:55

I need to extract images from a PDF document and their coordinates, and then paste this image into the new coordinates.

In borb, you can extract images as follows:

I: ImageExtraction = ImageExtraction()

# load
doc: typing.Optional[Document] = None
with open("test_image.pdf", "rb") as in_file_handle:
    doc = PDF.loads(in_file_handle, [I])

# check whether we have read a Document
assert doc is not None

print(I.get_images())

At the output we get a dictionary with images:

{0: [<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=660x660 at 0x1A03AC5FEE0>]}

And how to get the coordinates of this image and its dimensions, I do not know.

In addition, in borb you can insert an image as follows:

# set a PageLayout
layout: PageLayout = SingleColumnLayout(page)

# add an Image
layout.add(
    Image(
        I.get_images()[0][0],
        width=Decimal(237.72),
        height=Decimal(237.72),
    )
)

However, in this construction, you cannot select coordinates to insert an image. Perhaps you can choose some kind of PageLayout in which you can adjust its coordinates, but I do not know which one.

Original Q&A

There are 1 best solutions below

Joris Schellekens On 01 January 2024 at 19:43 BEST ANSWER

Disclaimer: I am the author of borb

ImageExtraction is meant to extract images, not their meta-information (such as their coordinates).

You would need to write your own implementation of EventListener in order to get the job done.

You can check out how ImageExtraction works to get a better idea of how it works.

Essentially, it listens to ImageRenderEvent objects being processed by the PDF parser logic.

ImageRenderEvent has a few methods that might be useful to you:

get_x()
get_y()
get_width()
get_height()

So, by creating your own EventListener and processing ImageRenderEvent you will be able to extract the images and their coordinates.

How can I use borb to extract an image from PDF and its coordinates on a page, and then paste it into the desired coordinates?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PDF

Related Questions in BORB

Trending Questions

Popular # Hahtags

Popular Questions