As a beginner at working with these kinds of real-time streaming services, I've spent hours trying to work out how this is possible, but can't seem to work out I'd precisely go about it.
I'm prototyping a personal basic web app that does the following:
In a web browser, the web application has a button that says 'Stream Microphone' - when pressed it streams the audio from the user's microphone (the user obviously has to consent to give permission to send their microphone audio) through to the server which I was presuming would be running node.js (no specific reason at this point, just thought this is how I'd go about doing it).
The server receives the audio close enough to real-time somehow (not sure how I'd do this).
I can then run ffmpeg on the command line and take the real-time audio coming in real-time and add it as the sound to a video file (let's just say I'm going to play testmovie.mp4) that I want to play.
I've looked at various solutions - such as maybe using WebRTC, RTP/RTSP, Piping audio into ffmpeg, Gstreamer, Kurento, Flashphoner and/or Wowza - but somehow they look overly complicated and usually seem to focus on video along with audio. I just need to work with audio.
As you've found there are numerous different options to receive the audio from a WebRTC enabled browser. The options from easiest to more difficult are probably:
Use a WebRTC enabled server such as Janus, Kurento, Jitsi (not sure about wowzer) etc. These servers tend to have plugin systems and one of them may already have the audio mixing capability you need.
If you're comfortable with node you could use the werift library to receive the WebRTC audio stream and then forward it to FFmpeg.
If you want to take full control over the WebRTC pipeline and potentially do the audio mixing as well you could use
gstreamer. From what you've described it should be capable of doing the complete task without having to involve a separate FFmpeg process.