i'm plotting the spectrogram of an audio signal. The plot shows the frequency for different time intervalls. The regions of interest are marked with a red circle because at these timings the microphon recorded a sound with an unknown frequency. The frequency is not important (at the moment). I'm trying to find a way to detect the start and endpoint in seconds for every recorded sound.

Example first marked sound in spectrogram:

In comparison with a normal line plot, at the beginning the values are zero and after ~4 seconds the values increase. The first value > 0 could be the start point and last value if value+1 = 0 is the end point. The first sound with the duration is detected.
This is my code and what i've done so far:
sampleRate = 44100
f, t, Sxx = signal.spectrogram(audio_original, sampleRate)
plt.figure(1)
plt.pcolormesh(t,f,Sxx,shading='gouraud')
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()
The length of f, t, Sxx is [129, 17072,129]
f # Array of sample frequencies
t # Array of segment times
Sxx # Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times