How to record audio from microphone and speaker in ReactJS with TypeScript and send it to the server for transcription?

5k Views Asked by At

I'm working on a ReactJS project with TypeScript where I need to implement an audio recording feature that captures audio from both the microphone and speaker simultaneously. The recorded audio should then be sent to the server for transcription.

I have already set up the basic audio recording using the MediaRecorder API to capture audio from the microphone. However, I'm unsure about how to capture audio from the speaker simultaneously. I also need guidance on how to send the recorded audio to the server for transcription.

I'm using Socket.io to communicate with the server, and the server is set up to handle audio transcription.

My questions are:

How can I modify the TranscriptComponent to record audio from both the microphone and speaker simultaneously? How can I send the recorded audio to the server for transcription using Socket.io? Any guidance, code examples, or resources would be greatly appreciated. Thank you!

Here's what I have so far in my TranscriptComponent:


// TranscriptComponent.tsx

import React, { useState } from 'react';

// ... (other imports and interfaces)

export const TranscriptComponent = (props: TranscriptComponentProps) => {
  // ... (other state variables and logic)

  const startRecording = (id: string) => {
    socket.connect();
    console.log("=recording started");
    navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
      const mimeTypes = ["audio/mp4", "audio/webm"].filter((type) =>
        MediaRecorder.isTypeSupported(type)
      );

      if (mimeTypes.length === 0) return alert("Browser not supported");
      setIsRecording(true);
      setStream(stream);
      setTimerInterval(
        setInterval(() => {
          setTranscriptLength((t) => t + 1);
        }, 1000)
      );
      let recorder = new MediaRecorder(stream, { mimeType: mimeTypes[0] });
      recorder.addEventListener("dataavailable", async (event) => {
        console.log("cheking data available for send");
        if (event.data.size > 0 && socket.connected) {
          console.log("sending audio");
          socket.emit("audio", { roomId: props.roomId, data: event.data });
        }else {
          console.log("no data avialable");
        }
      });
      recorder.start(1000);
    });
  };

  const stopRecording = () => {
    stream!.getTracks().forEach((track) => track.stop());
    setIsRecording(false);
    clearInterval(timerInterval);
    socket.emit("stop-transcript", { roomId: props.roomId });
    console.log("recording stopped");
    // socket. Close();
  };


  // ... (return and rendering logic)
};

2

There are 2 best solutions below

0
Saad Ali On

To implement simultaneous audio recording from both the microphone and speaker, you'll need to use the Web Audio API. The MediaRecorder API only allows recording from the microphone or screen, not both at the same time. The Web Audio API, on the other hand, allows you to create audio sources from both the microphone and the speaker and mix them together for recording.

Here's how you can modify the TranscriptComponent to achieve simultaneous audio recording:

import React, { useState, useEffect, useRef } from 'react';
import { Socket } from 'socket.io-client';
others Imports

export const TranscriptComponent = (props: TranscriptComponentProps) => {
  const [isRecording, setIsRecording] = useState(false);
  const [stream, setStream] = useState<MediaStream | null>(null);
  const [timerInterval, setTimerInterval] = useState<NodeJS.Timeout | null>(null);
  const [transcriptLength, setTranscriptLength] = useState(0);
  
  const socket = useRef<Socket | null>(null);

  useEffect(() => {
    socket.current = socket.connect();
    return () => {
      socket.current?.disconnect();
    };
  }, []);

  const startRecording = async (id: string) => {
    console.log("Recording started");
    try {
      const audioStream = await navigator.mediaDevices.getUserMedia({ audio: true });
      const speakerStream = await (navigator as any).mediaDevices.getDisplayMedia({
        audio: true,
        video: false,
      });
      
      const audioContext = new (window as any).AudioContext();
      const micSource = audioContext.createMediaStreamSource(audioStream);
      const speakerSource = audioContext.createMediaStreamSource(speakerStream);

      const destination = audioContext.createMediaStreamDestination();
      micSource.connect(destination);
      speakerSource.connect(destination);

      setIsRecording(true);
      setStream(destination.stream);

      const mimeTypes = ["audio/mp4", "audio/webm"].filter((type) =>
        MediaRecorder.isTypeSupported(type)
      );

      if (mimeTypes.length === 0) {
        return alert("Browser not supported");
      }

      setTimerInterval(
        setInterval(() => {
          setTranscriptLength((t) => t + 1);
        }, 1000)
      );

      let recorder = new MediaRecorder(destination.stream, { mimeType: mimeTypes[0] });

      recorder.addEventListener("dataavailable", async (event) => {
        if (event.data.size > 0 && socket.current?.connected) {
          socket.current?.emit("audio", { roomId: props.roomId, data: event.data });
        }
      });

      recorder.start(1000);
    } catch (error) {
      console.error("Error accessing media devices:", error);
    }
  };

  const stopRecording = () => {
    if (stream) {
      stream.getTracks().forEach((track) => track.stop());
    }
    setIsRecording(false);
    clearInterval(timerInterval!);
    socket.current?.emit("stop-transcript", { roomId: props.roomId });
    console.log("Recording stopped");
  };

  // ... (return and rendering logic)
};

We use the navigator.mediaDevices.getDisplayMedia method to capture audio from the speaker along with getUserMedia to capture audio from the microphone.

We create an AudioContext and connect both the microphone and speaker sources to a single destination using createMediaStreamSource and createMediaStreamDestination.

We then create a MediaRecorder instance using the combined stream from the destination, which includes both microphone and speaker audio.

Now, to send the recorded audio to the server for transcription using Socket.io, you can modify the server-side code to handle audio transcription. you can create a Socket.io event on the server to receive the audio data and perform the transcription using appropriate speech recognition libraries or APIs.

0
Raoof Naushad On

I have a solution based on @Saad's comment. This also used the same approach but in JSX format. One issue I faced was, I had to use a button to share the audio from another tab. I added it here.

import React, { useEffect, useRef } from "react";

const TransComp = () => {
  const statusRef = useRef(null);
  const transcriptRef = useRef(null);
  const socketRef = useRef(null);
  const audioCtx = useRef(null);

  const startRecording = () => {
    if (audioCtx.current.state === "suspended") {
      audioCtx.current.resume();
    }

    let mediaRecorder;

    const dest = audioCtx.current.createMediaStreamDestination();

    Promise.all([
      navigator.mediaDevices.getUserMedia({ audio: true }),
      navigator.mediaDevices.getDisplayMedia({
        video: { cursor: "always" },
        audio: true,
      }),
    ])
      .then(([micStream, displayStream]) => {
        if (!MediaRecorder.isTypeSupported("audio/webm")) {
          alert("Browser not supported");
          return;
        }

        [micStream, displayStream].forEach((str) => {
          const src = audioCtx.current.createMediaStreamSource(str);
          src.connect(dest);
        });

        mediaRecorder = new MediaRecorder(dest.stream, {
          mimeType: "audio/webm",
        });

        if (!socketRef.current) {
          socketRef.current = new WebSocket("ws://localhost:5555/listen");
        }

        socketRef.current.onopen = () => {
          if (statusRef.current) statusRef.current.textContent = "Connected";
          mediaRecorder.addEventListener("dataavailable", (event) => {
            if (event.data.size > 0 && socketRef.current.readyState === 1) {
              socketRef.current.send(event.data);
            }
          });
          mediaRecorder.start(250); //sending blobs of data every 250ms
        };

        socketRef.current.onmessage = (message) => {
          const received = message.data;
          console.log(received);
          if (received && transcriptRef.current) {
            transcriptRef.current.textContent += " " + received;
          }
        };
      })
      .catch((error) => {
        console.error("Error:", error);
      });
  };

  useEffect(() => {
    audioCtx.current = new AudioContext();
  }, []);

  useEffect(() => {
    return () => {
      if (socketRef.current) {
        socketRef.current.close();
      }
    };
  }, []);

  return (
    <div>
      <button onClick={startRecording}>Start</button>
      <div id="status" ref={statusRef} />
      <div id="transcript" ref={transcriptRef} />
    </div>
  );
};

export default TransComp;