I am trying to create a Ruby on Rails application that will use a microphone and web sockets to generate real-time speech to text transcription using Google Speech service.
I have successfully sent an audio stream through a web socket to the application, and I am also able to send and receive a transcript from the stream using the streaming_recognize method on an object created using Google::Cloud::Speech.speech. The problem is that I receive the transcript only after the stream has ended.
I tried to use async gem to send audio to input_streamand read output_stream in the Async blocks. The thing is the block which reads an output_stream blocks the entire application even the second block which sends audio chunks to the input_stream.
Here is my code (blocking):
require 'google/cloud/speech'
require 'gapic-common'
require 'async'
audio_file_path = './output.webm'
speech = Google::Cloud::Speech.speech
audio_content = File.binread audio_file_path
bytes_total = audio_content.size
bytes_sent = 0
chunk_size = 32_000
input_stream = Gapic::StreamInput.new
output_stream = speech.streaming_recognize input_stream
config = {
config: {
encoding: :WEBM_OPUS,
sample_rate_hertz: 48_000,
language_code: "pl-PL",
enable_word_time_offsets: true
}
}
input_stream.push streaming_config: config
Sync do |task|
# Simulated streaming from a microphone
task.async do
# Stream bytes...
while bytes_sent < bytes_total
puts "Sending audio chunk"
input_stream.push audio_content: audio_content[bytes_sent, chunk_size]
puts "Sent audio chunk"
bytes_sent += chunk_size
sleep 1
end
puts "Stopped passing"
input_stream.close
end
task.async do
loop do
puts "Checking output stream if any to read?"
puts output_stream.any? # If you comment out this line everything runs smoothly again
puts "Output stream checked"
sleep 1
end
end
end