Android's SpeechRecognizer with EXTRA_AUDIO_SOURCE still listens to the mic instead of from the file

163 Views Asked by At

I'm trying to make an app that lets the user record a sentence, and convert it into text. Running both MediaRecorder and SpeechRecognizer at the same time doesn't work, so I've decided to make the recording first, then pass that file into SpeechRecognizer using the EXTRA_AUDIO_SOURCE extra in the RecognizerIntent.

However, SpeechRecognizer doesn't seem to function as intended. It doesn't read from the file, and instead opens the mic to listen from it. I found this out by speaking really fast after I click the stop recording button. SpeechRecognizer caught the last few words and gave me the correct result. Here's a snippet of my code. I cut out the unimportant UI stuff.

private var recorder: MediaRecorder? = null
private var recognizer: SpeechRecognizer? = null

private val mediaFormat = MediaRecorder.OutputFormat.MPEG_4
private val audioEncoding = MediaRecorder.AudioEncoder.DEFAULT

private var currentRecordingFile: String = "recording_0.3gp"
private var recordingParcel: ParcelFileDescriptor? = null

// [ {"text": "speech to text result", "file": "path to clip recording"}, "time": "datetime" ]
private var translations = mutableStateListOf<Map<String, String>>()

private fun startTalking () {
    startRecording()
}
private fun stopTalking () {
    stopRecording()
    startRecognizing()
}

private fun startRecording () {
    val num = translations.count()
    currentRecordingFile = "$externalCacheDir/recording_$num.3gp"

    recorder = MediaRecorder(this).apply {
        setAudioSource(MediaRecorder.AudioSource.MIC)
        setOutputFormat(mediaFormat)
        setAudioEncoder(audioEncoding)
        setAudioChannels(1)
        setAudioSamplingRate(16000)
        setAudioEncodingBitRate(64000)
        setOutputFile(currentRecordingFile)

        try {
            prepare()
        } catch (e: IOException) {
            Log.e("startRecording", e.toString())
        }

        start()
    }
}
private fun stopRecording () {
    recorder?.apply {
        stop()
        release()
    }
    recorder = null
}

private fun startRecognizing () {
    val file = File(currentRecordingFile)
    recordingParcel = ParcelFileDescriptor.open(file, ParcelFileDescriptor.MODE_READ_ONLY)

    val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
    intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "in-ID")
    intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE, "in-ID")
    intent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE, recordingParcel)
    intent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_ENCODING, audioEncoding)
    intent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_CHANNEL_COUNT, 1)
    intent.putExtra(RecognizerIntent.EXTRA_AUDIO_SOURCE_SAMPLING_RATE, 16000)
    try {
        recognizer = SpeechRecognizer.createSpeechRecognizer(this)
        recognizer?.setRecognitionListener(this)
        recognizer?.startListening(intent)
    } catch (e: Exception) {
        Log.e("SpeechRecognizer", e.message.toString())
    }
}
private fun stopRecognizing () {
    recordingParcel?.close()

    recognizer?.stopListening()
    recognizer?.destroy()
    recognizer = null
}
override fun onError(error: Int) {
    Log.e("Speech onError", error.toString())
    stopRecognizing()
}

override fun onResults(results: Bundle){
    val words: ArrayList<String>? = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)
    if (words != null) {
        val sentence = words.joinToString(separator = " ")
        val translation = mapOf("text" to sentence, "file" to currentRecordingFile)
        translations.add(translation)
        Log.e("CURR RESULT", sentence)
    }
    stopRecognizing()
}

I suspect it's because I'm using ParcelFileDescriptor wrong? As a note, the parcel's getStatSize function returns a non-zero file size, and I can play the recording using the MediaPlayer, so the recording itself is okay.

Another reason might be because of the following line from the documentation. "If this extra is not set or the recognizer does not support this feature, the recognizer will open the mic for audio and close it when the recognition is finished." I don't know how to check the support list, but I'm testing this on Android 13.

Thank you for reading.

0

There are 0 best solutions below