aiortc: Combining multiple mp3 files to be returned as a single MediaStreamTrack

18 Views Asked by At

I will be using LLM (like GPT) to generate an answer - which would then be converted to speech, which I want to send over to the browser using aiortc. However, since LLM take time to produce complete output, instead of waiting for it to complete, we can read partial answers as soon it appears, and every few words, generate mp3 file for those many words, and then stream those. So not all the mp3 files would be available immediately, and instead, I need to keep on adding them as soon they appear (say every 4-5 words) from LLM.

I wrote a custom MediaStreamTrack to achieve the same. I have tried this with 2 files, a.mp3 and b.mp3.

I ran across 2 issues:

The last few hundred millisecond of a.mp3 sound stretched. The initial 2 (or so) second of b.mp3 go blank, and then after that it plays Clearly, the addition of frames need to be done better so that this can work. I am definitely missing something here - would be great if someone can point me in the right direction.

class CombinedAudioTrack(MediaStreamTrack):
    """
    An audio track which reads from multiple mp3
    """
    kind = "audio"
    currentMediaPlayer:MediaStreamTrack = None
    queue = asyncio.Queue()
    _stop: float = False

    def __init__(self) -> None:
        super().__init__()
        # self.readyState = "live"

    def addNewMP3File(self, mp3File, last:bool = False):
        self.queue.put_nowait(mp3File)
        if last:
            self._stop = True
    
    async def getNextMediaStreamTrack(self):
        mp3File = await self.queue.get()
        self.currentMediaPlayer = MediaPlayer(os.path.join(ROOT, mp3File)).audio

    async def recv(self) -> Frame:
        print("Came in recv")
        try:
            # Should only happen first time
            if not self.currentMediaPlayer:
                await self.getNextMediaStreamTrack()    
            frame = await self.currentMediaPlayer.recv()
            print(frame)
            return frame
        except MediaStreamError:
            # Its time to move the current media player forward
            if(self._stop):
                # self.stop()
                raise MediaStreamError()       
            await self.getNextMediaStreamTrack()
            return await self.currentMediaPlayer.recv()
0

There are 0 best solutions below