I have a setup like this:
- PDF files get uploaded to a storage bucket
- bucket/filename is published to a pub/sub topic
- messages forwarded at a fixed rate to another pub/sub topic
ImageAnnotatorClient.async_batch_annotate_files()with document_text_detection feature, and input and output GCS URIs.
I've found that in the resulting json files, the boundingBox attribute is missing for the symbol-level. Checking the documentation, symbols have a boundingBox attribute.
How can I potentially enable this for the output, or work around this?
Thanks
I tested with python in the Cloud Function and from command line. Both result in "...output-n-to-m.json" files where n and m are pagenumbers. In both outputs, at the symbol-level, boundingBox is absent. Symbols only have attributes "text" and "confidence".
The gcloud command was:
gcloud ml vision detect-text-pdf gs://[mybucket]/[pdffile] gs://[mybucket]/[output]