How can I use Google Document AI OCR to find the non-text images in a text document?

49 Views Asked by At

How can I use Google Document AI OCR to find the non-text images in a text document?

I'm using Google Document AI Enterprise OCR to OCR images (scans of old books_, and it works well. The books have figures at various points on the page. I'd like to use the API to find those figures, and distinguish them from both text and whitespace. Can I do that?

I tried the visual_elements attribute, but that is blank. From the docs, that seems to only find checkboxes and form fields, not other visual elements.

My goal is to digitize these old books to HTML, converting the text via OCR but copying the images in as-is. The scans are a single PNG per page of the book.

1

There are 1 best solutions below

0
Nestor On

I didnt see any documentation or processor can do this at the moment?However In Vision API has a feature to detect multiple images in an image file, I wonder if you can convert your files to images instead and use this feature as a workaround.

Otherwise I would recommend to request it as a feature request here:

https://cloud.google.com/contact