How can I use Google Document AI OCR to find the non-text images in a text document?

49 Views Asked by SRobertJames At 20 February 2024 at 23:24

I'm using Google Document AI Enterprise OCR to OCR images (scans of old books_, and it works well. The books have figures at various points on the page. I'd like to use the API to find those figures, and distinguish them from both text and whitespace. Can I do that?

I tried the visual_elements attribute, but that is blank. From the docs, that seems to only find checkboxes and form fields, not other visual elements.

My goal is to digitize these old books to HTML, converting the text via OCR but copying the images in as-is. The scans are a single PNG per page of the book.

Original Q&A

There are 1 best solutions below

Nestor On 22 February 2024 at 16:06

I didnt see any documentation or processor can do this at the moment?However In Vision API has a feature to detect multiple images in an image file, I wonder if you can convert your files to images instead and use this feature as a workaround.

Otherwise I would recommend to request it as a feature request here:

https://cloud.google.com/contact

How can I use Google Document AI OCR to find the non-text images in a text document?

There are 1 best solutions below

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in IMAGE-PROCESSING

Related Questions in OCR

Related Questions in IMAGE-RECOGNITION

Related Questions in CLOUD-DOCUMENT-AI

Trending Questions

Popular # Hahtags

Popular Questions