I have a java app which is sending call recording to my node server, where I'm transcribing it using Google's Speech-To-Text tool. Here's what a part of it looks like
const speechClient = new speech.SpeechClient();
const file = req.files[0].buffer;
const audioBytes = file.toString('base64');
const audio = {
content: audioBytes
};
const config = {
encoding: 'AMR_WB',
sampleRateHertz: 16000,
languageCode: 'bn-BD',
};
const data = await speechClient.recognize({audio, config})
const transcription = data[0].results.map(r => r.alternatives[0].transcript).join("\n");
console.log(transcription);
Since I'm using a lossy format, it should work as per the official docs here. But I just get an empty string. Any other sampleRateHertz throws an error of bad sampleRate.
Tried this combination as well, which also returns empty string
const config = {
encoding: 'LINEAR16', // Audio encoding (change if needed). FLAC/LINEAR16/AMR_WB
sampleRateHertz: 44000, // Audio sample rate in Hertz (change if needed).
languageCode: 'bn-BD', // Language code for the audio (change if needed).
};
Please help. Basically recording calls from an old samsung running android 6 and transcribing for sentiment analysis. For a school project.
Thanks