Piping TTS to WHIP

36 Views Asked by At

I'm building a real-time audio app using WebRTC and AWS and on the server side I want to pipe an OpenAI Text-To-Speech (TTS) response to a WebRTC-HTTP ingestion protocol (WHIP) endpoint.

Is there a simple way to do this?

In the browser sending user media (mic and camera) to a WHIP endpoint is fairly easy using the browser's Media Capture and Streams API. As the browser's WebRTC and Audio APIs are unavailable in Node.js I'm struggling to take an HTTP response, or audio file, and construct a MediaStreamTrack that can be sent from my server code to the WebRTC server.

I've asked ChatGPT, tried Gstreamer, experimented with the werift and node-webrtc Node.js packages and the aiortc and aws-streamer Python packages, but keep coming up against roadblocks.

Any suggestions regarding a simple way to do this or an alternative approach would be much appreciated.

I'm more familiar with JavaScript and TypeScript than Python, but I'm happy to use whatever will be the easiest solution.

Keeping latency low would be ideal but I'd also be happy to get something working now and optimise later.

1

There are 1 best solutions below

0
DMakeev On
  • you can use Janus Media Server (streaming plugin), it supports file/rtp/rtsp inputs, WHIP and there is a JS/TS client side library - probably, it's the easiest way for JS/TS developer.
  • GStreamer is a good way to do everything properly. But, afaik, there is no way to use WebRTCbin from TS properly, so you'll need to play with Python.