Google Cloud Speech-to-Text returns empty transcription for OGG OPUS Base64 audio

583 Views Asked by At

I am trying to transcribe an OGG OPUS Base64 encoded audio string using Google Cloud Speech-to-Text API in Node.js. The audio has a sample rate of 48000 hertz. When I run my code, the API returns an empty transcription. This only happens sometimes. Other times, it transcribes the audio just fine. I will return to the project later and find that the error returns randomly. When I convert the Base64 to a Buffer and save the file, the audio plays fine in VLC player, and ffprobe shows the correct information for the resulting file.

I have already tried checking the audio quality, encoding, sample rate, etc., but none of these solutions helped. Here's my code:

import { SpeechClient } from "@google-cloud/speech";

// `base64Audio` looks like this:
//   "data:audio/ogg; codecs=opus;base64,T2dnUwACAAAAAAAA..."
export async function transcribeB64(base64Audio: string): Promise<string> {
  const client = new SpeechClient();
  return new Promise(async (resolve) => {
    const content = base64Audio.split(",")[1];
    const x = await client.recognize({
      config: {
        encoding: "OGG_OPUS",
        sampleRateHertz: 48000,
        languageCode: "en-US",
      },
      audio: {
        content,
      },
    });
    resolve(JSON.stringify(x, null, 2));
  });
}

The API response looks like this:

[
  {
    "results": [],
    "totalBilledTime": {
      "seconds": "0",
      "nanos": 0
    },
    "speechAdaptationInfo": null,
    "requestId": "000000"
  },
  null,
  null
]

And this is the ffprobe output:

Input #0, ogg, from 'input.ogg':
  Duration: 00:00:05.74, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
    Metadata:
      ENCODER         : Mozilla111.0.1

Why is my audio not being transcribed?

1

There are 1 best solutions below

0
Rick On

I was not able to isolate a root cause, but it appears that changing the codec from "OGG_OPUS" to "WEBM_OPUS" fixed the problem so far. I would love to hear possible explanations of why this is happening but I have none at the moment.