I’ve been experiencing an issue with the Google Cloud Document AI API in my Firebase Cloud Function that handles documents uploaded to Google Cloud Storage. The function triggers correctly upon PDF uploads, but I consistently receive the error "400 No valid schema provided for processing" when trying to process documents with my custom Document AI processor. The code itself is based on the processing request documentation for document AI, as well as its sample requests. Community solutions are admittedly sparse, and so far the only things I can find online are people with the same problem.
I’ve tried verifying the processor ID, checked service account (the service account has owner for Document AI, Cloud Storage and Firebase for testing), and tried simpler PDFs - all with no luck.
I’m sure the issue is the request structure, but I’m not sure how to fix it. Any help is appreciated!
MY CODE:
from google.cloud import documentai_v1beta3 as documentai
def process_pdf(event, context):
location = 'us'
opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
documentai_client = documentai.DocumentProcessorServiceClient(client_options=opts)
project_id = os.environ.get('PROJECT_ID')
processor_id = {{MY_PROCESSOR_ID}}
name = documentai_client.processor_path(project_id, location, processor_id)
content = blob.download_as_bytes()
raw_document = documentai.RawDocument(content=content, mime_type="application/pdf")
request = documentai.ProcessRequest(name=name, raw_document=raw_document)
try:
result = documentai_client.process_document(request=request)
except Exception as e:
print(f"Error processing the document: {type(e).__name__} - {str(e)}")
return```
This specific processor requires a
schemaeither in the request, or to be configured explicitly ahead of time in the UI.[Request] Set the
process_options.schema_overrideparameter insideprocess_options- there is a similar code snippet on their github[Explicitly] Update the schema during the 'build' phase of the UI: docs