pdf manipulation - tagging image or figure

223 Views Asked by At

I have a source pdf(untagged.pdf) out of which I would be creating a tagged version(tagged.pdf)

I have information of all the html tags of all contents of the source pdf.

Now I have a figure on page 3. When I programmatically parse, this will not be detected as an image but this is a rectangle with some text and another rectangle like below.

    _____________________         ____________________
   |    Some text inside | ----> |   Some other text  |
   |                     | ----> |            Inside  |
   |_____________________| ----> |____________________|

             Fig 1.x Rectangle 1 to Rectangle 2

Using some other techniques, I have detected this is a figure and bounding coordinates of the same. Lets say the bounding coordinates is [10, 30] and [100, 60], I want to tag the whole thing as a figure(like below)

   _____________________________________________________________(100, 60)
  |                                                             |
  |      _____________________         ____________________     |
  |     |    Some text inside | ----> |   Some other text  |    |
  |     |                     | ----> |            Inside  |    |
  |     |_____________________| ----> |____________________|    |
  |                                                             |
  |           Fig 1.x Rectangle 1 to Rectangle 2                |
  |_____________________________________________________________|
  (10, 30)

Now I want to tag this the entire section as an image. I have checked libraries like itextpdf or pdfbox. They dont have APIs to tag a figure using coordinates.

In other words, are there any ways to tag an element(group of images) as a figure programmatically.

0

There are 0 best solutions below