Why am I getting "index 0 is out of bounds for axis 0 with size 0 when using pyAudioAnalysis library?

132 Views Asked by At

This question is about Speaker diarization. I'm trying to make a script that separates a mp4 file into different segments depending on different speakers. (The input mp4 file contains the dialogue of 4 different speakers)

The code is as follows:

from pyAudioAnalysis import audioSegmentation
import moviepy.editor as mp

def separate_speakers(input_file, num_speakers):
    video = mp.VideoFileClip(input_file)
    audio_file = "temp_audio.wav"
    video.audio.write_audiofile(audio_file)

    segments = audioSegmentation.speaker_diarization(audio_file, num_speakers)

    if len(segments) == 0:
        raise Exception("Speaker diarization failed to detect any speaker segments.")

    output_files = []
    for i, segment in enumerate(segments):
        start_time = segment[0]
        end_time = segment[1]
        output_file = f"speaker_{i+1}.mp4"
        video.subclip(start_time, end_time).write_videofile(output_file)
        output_files.append(output_file)

    import os
    os.remove(audio_file)

    return output_files

input_file = "ielts1.mp4"
num_speakers = 4

try:
    output_files = separate_speakers(input_file, num_speakers)
    print("Speaker separation completed. Output files:", output_files)
except Exception as e:
    print("Error:", str(e))

The program can successfully generate the first segment in my output folder, but runs into error right after: rror: index 0 is out of bounds for axis 0 with size 0

Any idea what went wrong?

Edit: I added raise in the except block and this is the result:


MoviePy - Writing audio in temp_audio.wav
MoviePy - Done.
Moviepy - Building video speaker_1.mp4.
MoviePy - Writing audio in speaker_1TEMP_MPY_wvf_snd.mp3
chunk:   0%|                                                                                                                                                                                                | 0/1 [00:00<?, ?it/s, now=None]Error: index 0 is out of bounds for axis 0 with size 0
Traceback (most recent call last):
  File "c:\Users\User\Documents\diarization\3.py", line 37, in <module>
    output_files = separate_speakers(input_file, num_speakers)
  File "c:\Users\User\Documents\diarization\3.py", line 22, in separate_speakers
    video.subclip(start_time, end_time).write_videofile(output_file)
  File "<decorator-gen-55>", line 2, in write_videofile
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\decorators.py", line 54, in requires_duration
    return f(clip, *a, **k)
  File "<decorator-gen-54>", line 2, in write_videofile
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\decorators.py", line 135, in use_clip_fps_by_default
    return f(clip, *new_a, **new_kw)
  File "<decorator-gen-53>", line 2, in write_videofile
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\decorators.py", line 22, in convert_masks_to_RGB
    return f(clip, *a, **k)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\video\VideoClip.py", line 293, in write_videofile
    self.audio.write_audiofile(audiofile, audio_fps,
  File "<decorator-gen-45>", line 2, in write_audiofile
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\decorators.py", line 54, in requires_duration
    return f(clip, *a, **k)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\audio\AudioClip.py", line 206, in write_audiofile
    return ffmpeg_audiowrite(self, filename, fps, nbytes, buffersize,
  File "<decorator-gen-9>", line 2, in ffmpeg_audiowrite
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\decorators.py", line 54, in requires_duration
    return f(clip, *a, **k)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\audio\io\ffmpeg_audiowriter.py", line 166, in ffmpeg_audiowrite
    for chunk in clip.iter_chunks(chunksize=buffersize,
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\audio\AudioClip.py", line 85, in iter_chunks
    yield self.to_soundarray(tt, nbytes=nbytes, quantize=quantize,
  File "<decorator-gen-44>", line 2, in to_soundarray
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\decorators.py", line 54, in requires_duration
    return f(clip, *a, **k)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\audio\AudioClip.py", line 127, in to_soundarray
    snd_array = self.get_frame(tt)
  File "<decorator-gen-11>", line 2, in get_frame
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\decorators.py", line 89, in wrapper
    return f(*new_a, **new_kw)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\Clip.py", line 93, in get_frame
    return self.make_frame(t)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\Clip.py", line 136, in <lambda>
    newclip = self.set_make_frame(lambda t: fun(self.get_frame, t))
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\Clip.py", line 187, in <lambda>
    return self.fl(lambda gf, t: gf(t_func(t)), apply_to,
  File "<decorator-gen-11>", line 2, in get_frame
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\decorators.py", line 89, in wrapper
    return f(*new_a, **new_kw)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\Clip.py", line 93, in get_frame
    return self.make_frame(t)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\audio\io\AudioFileClip.py", line 77, in <lambda>
    self.make_frame = lambda t: self.reader.get_frame(t)
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\Lib\site-packages\pyAudioAnalysis\..\moviepy\audio\io\readers.py", line 171, in get_frame
    "Accessing time t=%.02f-%.02f seconds, "%(tt[0], tt[-1])+
IndexError: index 0 is out of bounds for axis 0 with size 0

Another edit: Thanks to the comments by @shaik moeed, I will add some more info down below.

I checked by adding clip = video.subclip(start_time, end_time) and try clip.ipython_display(width = 480) It is not getting a valid video clip.

So I used print(segments)and it gives me (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int64), -1, -1)

Comments suggest I need help in calculating the start, and end times of each clip. I don't think the segments are supposed to be 0 values. Anyone knows how to fix this?

0

There are 0 best solutions below