I am trying to implement smooth zoomable audio waveform but am puzzled with the correct approach to implement zoom. I searched internet but there is very little or no information.
So here is what I have done:
Read audio samples from file and compute waveform points with samplesPerPixel = 10, 20, 40, 80, ....,10240. Store the datapoints for each scale (11 in total here). Max and min are also stored along with points for each samplesPerPixel.
When zooming, switch to the closest dataset. So if samplesPerPixel at current width is 70, then use dataset corresponding to samplesPerPixel = 80. The correct dataset index is easily found using log2(samplesPerPixel).
Use subsampling of the dataset to draw waveform points. So if we samplesPerPixel = 41 and we are using data set for zoom 80, then we use the scaling factor 80/41 to subsample.
let scaleFactor = 80.0/41.0 x = waveformPointX[i*scaleFactor]
I am yet to find a better approach and not too sure if the above approach of subsampling is correct, but for sure this approach consumes lot of memory and also is slow to load data at the start. How do audio editors implement zooming in waveform, is there an efficient approach?
EDIT: Here is a code for computing mipmaps.
public class WaveformAudioSample {
var samplesPerPixel:Int = 0
var totalSamples:Int = 0
var samples: [CGFloat] = []
var sampleMax: CGFloat = 0
}
private func downSample(_ waveformSample:WaveformAudioSample, factor:Int) {
NSLog("Averaging samples")
var downSampledAudioSamples:WaveformAudioSample = WaveformAudioSample()
downSampledAudioSamples.samples = [CGFloat](repeating: 0, count: waveformSample.samples.count/factor)
downSampledAudioSamples.samplesPerPixel = waveformSample.samplesPerPixel * factor
downSampledAudioSamples.totalSamples = waveformSample.totalSamples
for i in 0..<waveformSample.samples.count/factor {
var total:CGFloat = 0
for j in 0..<factor {
total = total + waveformSample.samples[i*factor + j]
}
let averagedSample = total/CGFloat(factor)
downSampledAudioSamples.samples[i] = averagedSample
}
NSLog("Averaged samples")
}


You should use power of 2 size of your data
This will allow you to use just cheap bit shifts and simple resizing without any costly floating point operations or integer multiplicatin and division.
You should do half resolution mipmaps using previous mipmap
This will always create one sample from 2 samples of previous mipmap so no nested for loops or costly index computations
Do not mix floating and integer computations if you can avoid it
even if you have FPU the conversion between int and float is usually very slow. Ideally keep your audio data in integer format...
Here small C++/VCL example of these ideas:
Ignore the window VCL and rendering related stuff (I just wanted to pass whole source so you can see how it is used). The important is only the function
mipmap_computewhich converts your input data to 2 mipmaps. One is holding min values and the other max values.The dynamic allocatins are not important the only important code chunk is marked with comment:
Where for each mipmap there is only single
forloop without any expensive operations. If your platform is better with branchless code you can compute the min,max using in-build brunchless functionsmin,max. Something like:This can be further optimized simply by using pointer to actually selected mipmaps that will get rid of the
[k]and[k-1]indexes allowing one less memory access per each element access.Now all you need is to bilinearly interpolate between 2 mipmaps to achieve your resolution, here small example for this:
Beware target size
nmust be less or equal to highestmipmapresolution...This is how it looks (when I change the resolution manually with mouse wheel):
The choppyness is caused by GIF grabber ... the scaling is fast and seamless in real.