What kind of wav or wave sound data format is required in vosk nodejs library for speech recognition?

59 Views Asked by At

Vosk is a speech recognition framework. In the provided samples, they use a wav recorded directly from microphone (native) and it works

My requirement is to get the sound from a stream (socket) instead of local microphone but vosk is not detecting the buffer as valid wav file.

rec.acceptWaveform returns [true] when buffer comes from microphone

const rec = new vosk.Recognizer
var mic = require("mic");
var micInstance = mic({...});

micInputStream.on('data', async (buffer) => {    
    if (rec.acceptWaveform(buffer)){
      console.log("rec.result():", rec.result())

rec.acceptWaveform returns [false] when buffer comes from socket client

const io = require('socket.io')(server, { maxHttpBufferSize: 1e7 })
rec = new vosk.Recognizer({model: model, sampleRate: sampleRate});  
io.on('connection', function (socket) {
  socket.on('send-audio', async function (data) {
    console.log("received:", data)
    if(rec.acceptWaveform(data)){
      console.log("rec.acceptWaveform:", true)
      console.log("rec.result():", rec.result())
    }else{
      console.log("rec.acceptWaveform:", false)
    }

Attempts and Research

  • According to the nodejs vosk library, buffer should be an audio data in PCM 16-bit mono format

  • I'm using this nodejs library to inspect the wave buffer called wavefile

  • The buffer received from socket can be read as wav file using the library wavefile. The wave details confirms that is a valid wave file, but for vosk is not a wave format. I also can save it directly as file and audacity is able to read it.

    • enter image description here
  • The buffer received directly from microphone cannot be read with wavefile library but for vosk is a valid wave format. If I save the buffer as file, using audacity the wave file is not valid

    • enter image description here
  • I also tried sending only the data section bytes without success.

  • I raised 02 issues

  • I will try with python just to test if it is a bug with wave from socket clients.

Question

What kind of wav or wave data format is required in vosk nodejs library for speech recognition?

Reproducible sample

I create a reproducible source code:

https://github.com/jrichardsz/nodejs-wav-vosk-transcription

0

There are 0 best solutions below