Is there any way to merge overlapping audio segments into one with very low latency?

103 Views Asked by At

I've this problem where I'm faced with overlapping audio segments: in particular, audio segment t overlaps with audio segment t+1 in the last couple of words (e.g. t = "I like to move" and t+1 = "to move in circles all day"). Is there any way that I can merge the 2 audio segments so that they sound smooth? I've tried doing something the following:

curr_audio = curr_audio[len(prev_audio):] #Where both items are PyDub AudioSegments

but that does not seem to solve it and only gives me the final part of the audio segment with non-overlapping parts missing. Am I doing something wrong here? I've also tried working with bytes and doing a similar approach to above, but it seems like the byte streams of the generated audios are completely different which makes this problem super hard.

Is there a nice way to do this in near real-time?

Edit of a working example:

Here's sample 1: https://vocaroo.com/14ygGStF6788 Here's sample 2: https://voca.ro/18S4ZjSwgjey

1

There are 1 best solutions below

0
Be dogmatic forever0 On

Merging two audio clips to make them sound smooth requires some audio editing skills. You can use some professional audio editing software to achieve this goal, such as Audacity, Adobe Audition, etc. These software have some features that can be used to process audio overlap and transition.

However, if you want to solve this problem programmatically, you can use some Python libraries to process audio files. For example, you can use the pydub library to load, process, and save audio files.

This is a basic step that you can refer to:

First, you need to determine the overlapping part of the two audio clips. You can determine this by listening to the two audio clips, or if you have more precise information, such as the waveform of the audio or the metadata of the audio, you can use these data to determine the overlapping part. Then, you can use the AudioSegment class of pydub to load your audio file. You can create an AudioSegment instance from an audio file using the from_file method. For the overlapping part, you may need to create a new AudioSegment with a length equal to the length of the overlapping part of the two audio clips. Then, you can use this new AudioSegment to replace the beginning part of the second audio clip. Finally, you can use the concatenate method to merge two audio clips together. This method will merge two AudioSegment instances into a new instance. You can use the export method to save your results. This method saves the AudioSegment instance as an audio file. Please note that due to the complexity of audio processing, the above steps may require some adjustments to suit your specific situation. You may need to spend some time adjusting these steps to achieve the results you want. You may also need to consult the documentation of pydub to better understand how to use this library.