Performing prediction on pdf in GCP Bucket using Node js: invalid argument

275 Views Asked by At

On Google Cloud Platform (GCP) I have an AutoML Natural Language entity extraction model trained on PDFs. It thus needs to perform predictions on PDFs.

I have a PDF I want to perform an entity extraction prediction inside a GCP Bucket, thus I want to make the following request in NodeJS and call predict using PredictionServiceClient:

This code sample was based on the entity extraction example on normal text from Entity Extraction NodeJS Text example, and the NodeJS AutoML NL docs to find what the IPredictRequest looks like: AutoML Google API docs IPredictRequest.

const projectId = 'projectId';
const location = 'us-central1';
const datasetId = 'TEN4565454654564855555'; // randomised for this example
const srcFilename = 'file_name.pdf';

// Imports the Google Cloud AutoML library
const {PredictionServiceClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new PredictionServiceClient();

const request = {
    name: client.modelPath(projectId, location, datasetId),
    payload: {
      document: {
        inputConfig: {
          gcsSource: {
            inputUris: ["gs://bucket_name/" + srcFilename]
          }
        }
      }
    },
  };

const [response] = await client.predict(request);

Then I get error:

Error: 3 INVALID_ARGUMENT: Request contains an invalid argument. at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:26) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:179:52) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:336:141) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:299:181) at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:145:78 at processTicksAndRejections (internal/process/task_queues.js:77:11)
Error: 3 INVALID_ARGUMENT: Request contains an invalid argument. at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:26) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:179:52) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:336:141) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:299:181) at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:145:78 at processTicksAndRejections (internal/process/task_queues.js:77:11)

I have also tried it with using underscores in input_config instead of inputConfig and the same for gcs_source and input_uris. I tried this because this is what an example request.json looks like on the NL model Test&Use page (see image).

[![Example request.json][3]][3] Then I get:

Error: 3 INVALID_ARGUMENT: List of found errors: 1.Field: payload.document.document_text.content; Message: Required field not set. at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:26) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:179:52) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:336:141) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:299:181) at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:145:78 at processTicksAndRejections (internal/process/task_queues.js:77:11)
Error: 3 INVALID_ARGUMENT: List of found errors: 1.Field: payload.document.document_text.content; Message: Required field not set. at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:26) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:179:52) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:336:141) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:299:181) at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:145:78 at processTicksAndRejections (internal/process/task_queues.js:77:11)

In the NodeJS NL docs it says that documentText is optional, so I don't understand this error. Trying to fix this last error by supplying the document_text field created the first error in this post again (and I don't want to have to supply the text in the PDF manually, since it is a photocopy).

How do I fix this and more importantly how do I parse/understand the documentation and error messages? Why camelcase in some places and using underscores in others?

0

There are 0 best solutions below