Audio to spectrogram image and back to audio

Question

Audio to spectrogram image and back to audio

111 Views Asked by Dave Cooper At 21 February 2024 at 23:11

I am able to convert a wav file to spectrogram and then back again with an acceptable level of quality. I can plot and save that spectrogram as jpg file, but I have been able to import the jpg and convert it back to audio.

I can convert the audio to a db scaled spectrogram

import librosa
x, sr = librosa.load(librosa.ex('trumpet'))
X = librosa.stft(x)
Xdb = librosa.amplitude_to_db(abs(X))

And I am able to convert the db scaled spectrogram back to audio

X2 = librosa.db_to_amplitude(Xdb)
audio = librosa.griffinlim(X2)
import soundfile as sf
sf.write("test1.wav", audio, sr)

I can save the array as a 32bit Tiff, and recreate the audio from that tiff file.

from PIL import Image
import numpy as np

im =Image.fromarray(Xdb).convert('F')
im.save("test.tiff")
img = Image.open("test.tiff")
recspec = np.array(img)

X2 = librosa.db_to_amplitude(recspec)
audio = librosa.griffinlim(X2)
import soundfile as sf
sf.write("test1.wav", audio, sr)

I can plot the db scaled spectrogram and save it as a jpg

from matplotlib import pyplot as plt
import librosa.display
fig = plt.figure(figsize=(10, 10), dpi=1000, frameon=False)
ax = fig.add_axes([0, 0, 1, 1], frameon=False)
ax.axis('off')
librosa.display.specshow(Xdb, sr=sr, cmap='gray', x_axis='time', y_axis='hz')
plt.savefig("test.jpg", bbox_inches=0, pad_inches=0)

But I have been completely unable to figure out how to reimport the jpg in such a way as to recreate the audio from it. I realise is not as simple as just importing the jpg in the same way as the tiff and in saving it as a lossy format like jpg is going to result in some significant loss of quality, but I would be ok with that if the resulting audio at least slightly resembled what went in.I have looked into code to do similar things but their approach has been much more complicated such as using the colour channels to encode phase etc, I have been happy with the quality of the griffinlim reconstruction so am happy to skip that. If someone could point me in the right direction that would be great.

Original Q&A

There are 1 best solutions below

**Jon Nordby** · Answer 1 · 2024-02-28T11:50:23.597000

As you have alluded to, finding the waveform from a magnitude spectrogram with Griffin-Lim will have some limitations in fidelity. But if you are happy with the results in that case, then the issue is specific to the JPEG encoding (or decoding).

First, your way of saving the JPEG is wrong. You should not plot the values. But instead save the spectrogram array of values, using PIL. The same way you do for TIFF.

There are two key challenges when encoding a magnitude spectrogram into a JPEG:

Limited precision. Only 8-bit integers are supported in JPEG
Compression artifacts. JPEG uses lossy compression designed for vision
Color downsampling. JPEG uses spatial downsampling of Cr/Cb

Regarding 2) - turn off all compression in the start. You can try to re-introduce it later, but get the simple case working first.

Regarding 1. You must make sure your spectrogram values fit into the range 0-255. A good starting point is to decibel-scale the spectrogram (using for example librosa.power_to_db()), and then use a linear mapping between the values you get and 0-255. The key to decoding the spectrogram later is to know these values, so you can reverse the process. This can either be done by having fixed/hard-coding scaling values, but it might be tricky to find values that work for all audio/spectrogram input. Or you can store the scaling factors as metadata in JPEG, using a custom EXIF tag.

Regarding 3) Make sure that you are not saving color JPEGs. Instead, use a single greyscale/luminosity channel.

PNG would be a bit less limiting, if you do not get appropriate quality from JPEG. As it supports 16-bit values, and uses lossless compression.

I have stored magnitude spectrograms successfully in JPEG files. However, we never did convert it back to audio. Instead, it was used as spectrogram values for machine learning and computing acoustical parameters such as short-time sound levels.

Audio to spectrogram image and back to audio

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in JPEG

Related Questions in LIBROSA

Related Questions in SPECTROGRAM

Trending Questions

Popular # Hahtags

Popular Questions