Correct way to draw zoomable audio waveform

Question

Correct way to draw zoomable audio waveform

649 Views Asked by Deepak Sharma At 09 March 2022 at 18:05

I am trying to implement smooth zoomable audio waveform but am puzzled with the correct approach to implement zoom. I searched internet but there is very little or no information.

So here is what I have done:

Read audio samples from file and compute waveform points with samplesPerPixel = 10, 20, 40, 80, ....,10240. Store the datapoints for each scale (11 in total here). Max and min are also stored along with points for each samplesPerPixel.
When zooming, switch to the closest dataset. So if samplesPerPixel at current width is 70, then use dataset corresponding to samplesPerPixel = 80. The correct dataset index is easily found using log2(samplesPerPixel).
Use subsampling of the dataset to draw waveform points. So if we samplesPerPixel = 41 and we are using data set for zoom 80, then we use the scaling factor 80/41 to subsample.

let scaleFactor = 80.0/41.0 x = waveformPointX[i*scaleFactor]

I am yet to find a better approach and not too sure if the above approach of subsampling is correct, but for sure this approach consumes lot of memory and also is slow to load data at the start. How do audio editors implement zooming in waveform, is there an efficient approach?

EDIT: Here is a code for computing mipmaps.

   public class WaveformAudioSample {
     var samplesPerPixel:Int = 0
     var totalSamples:Int = 0
     var samples: [CGFloat] = []
     var sampleMax: CGFloat = 0
   }

   private func downSample(_ waveformSample:WaveformAudioSample, factor:Int) {
    NSLog("Averaging samples")
   
    var downSampledAudioSamples:WaveformAudioSample = WaveformAudioSample()
    downSampledAudioSamples.samples = [CGFloat](repeating: 0, count: waveformSample.samples.count/factor)
    downSampledAudioSamples.samplesPerPixel = waveformSample.samplesPerPixel * factor
    downSampledAudioSamples.totalSamples = waveformSample.totalSamples
    
    for i in 0..<waveformSample.samples.count/factor {
        var total:CGFloat = 0
        for j in 0..<factor {
            total = total + waveformSample.samples[i*factor + j]
        }
        let averagedSample = total/CGFloat(factor)
        downSampledAudioSamples.samples[i] = averagedSample
    }
    
    NSLog("Averaged samples")
}

Original Q&A

There are 2 best solutions below

**Spektre** · Answer 1 · 2022-03-10T12:03:52.040000

You should use power of 2 size of your data

This will allow you to use just cheap bit shifts and simple resizing without any costly floating point operations or integer multiplicatin and division.
You should do half resolution mipmaps using previous mipmap

This will always create one sample from 2 samples of previous mipmap so no nested for loops or costly index computations
Do not mix floating and integer computations if you can avoid it

even if you have FPU the conversion between int and float is usually very slow. Ideally keep your audio data in integer format...

Here small C++/VCL example of these ideas:

//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#include <math.h>
#pragma hdrstop
#include "win_main.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
int xs,ys;              // screen resolution
Graphics::TBitmap *bmp; // back buffer bitmap for rendering
//---------------------------------------------------------------------------
// input data
const int samples=1024;
int sample[samples];
// mipmas max 32 resolutions -> 2^32 samples input
int *mmdat0[32]={NULL}, // min
    *mmdat1[32]={NULL}, // max
     mmsiz[32]={0};     // resolution
//---------------------------------------------------------------------------
void generate_input(int *data,int size)
    {
    int i; float a,da;
    da=10.0*M_PI/float(size-1);
    for (a=0.0,i=0;i<size;i++,a+=da)
        {
        data[i]=float(100.0*sin(a))+Random(40)-20;
        }
    }
//---------------------------------------------------------------------------
void mipmap_free()
    {
    // free allocated mipmaps if needed
    if (mmdat0[0]) delete[] mmdat0[0];
    mmdat0[0]=NULL;
    mmdat1[0]=NULL;
    mmsiz[0]=0;
    }
//---------------------------------------------------------------------------
void mipmap_compute(int *data,int size)
    {
    int i,j,k,n,N,a,a0,a1;
    mipmap_free();
    for (N=0,n=size;n;N+=n,n>>=1);  // compute siz of all mipmas together
    mmdat0[0]=new int[N+N];         // allocate space for all mipmas as single 1D array
    mmdat1[0]=mmdat0[0]+N;          // max will be at the other half
    mmsiz [0]=size;
    for (i=1,n=size;n;n>>=1,i++)    // and just set pointers of sub mipmas
        {
        mmdat0[i]=mmdat0[i-1]+n;    // to point at the the right place
        mmdat1[i]=mmdat1[i-1]+n;    // to point at the the right place
        mmsiz [i]=mmsiz [i-1]>>1;   // and set resolution as half
        }
    // copy first mipmap
    n=size;
    for (i=0;i<mmsiz[0];i++)
        {
        a=data[i];
        mmdat0[0][i]=a;
        mmdat1[0][i]=a;
        }
    // process all resolutions
    for (k=1;mmsiz[k];k++)
        {
        // halve resolution
        for (i=0,j=0;i<mmsiz[k];i++)
            {
            a=mmdat0[k-1][j];                a0=a;
            a=mmdat1[k-1][j]; j++;           a1=a;
            a=mmdat0[k-1][j];      if (a0>a) a0=a;
            a=mmdat1[k-1][j]; j++; if (a1<a) a1=a;
            mmdat0[k][i]=a0;
            mmdat1[k][i]=a1;
            }
        }
    }
//---------------------------------------------------------------------------
void draw() // just render of my App
    {
    bmp->Canvas->Brush->Color=clWhite;
    bmp->Canvas->FillRect(TRect(0,0,xs,ys));

    int ix,x,y,y0=ys>>1;

    // plot input data
    bmp->Canvas->Pen->Color=clBlack;
    x=0; y=y0-sample[x];
    bmp->Canvas->MoveTo(x,y);
    for (x=1;x<xs;x++)
        {
        y=y0-sample[x];
        bmp->Canvas->LineTo(x,y);
        }

    // plot mipmap[ix] input data
    ix=1;
    bmp->Canvas->Pen->Color=clBlue;
    x=0; y=y0-sample[x];
    bmp->Canvas->MoveTo(x,y);
    for (x=0;x<mmsiz[ix];x++)
        {
        y=y0-mmdat0[ix][x];
        bmp->Canvas->LineTo(x,y);
        y=y0-mmdat1[ix][x];
        bmp->Canvas->LineTo(x,y);
        }

    Form1->Canvas->Draw(0,0,bmp);
//  bmp->SaveToFile("out.bmp");
    }
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner) // init of my app
    {
    // init backbuffer
    bmp=new Graphics::TBitmap;
    bmp->HandleType=bmDIB;
    bmp->PixelFormat=pf32bit;

    generate_input(sample,samples);
    mipmap_compute(sample,samples);
    }
//---------------------------------------------------------------------------
void __fastcall TForm1::FormDestroy(TObject *Sender) // not important just destructor of my App
    {
    mipmap_free();
    delete bmp;
    }
//---------------------------------------------------------------------------
void __fastcall TForm1::FormResize(TObject *Sender) // not important just resize event
    {
    xs=ClientWidth;
    ys=ClientHeight;
    bmp->Width=xs;
    bmp->Height=ys;
    draw();
    }
//-------------------------------------------------------------------------
void __fastcall TForm1::FormPaint(TObject *Sender) // not important just repaint event
    {
    draw();
    }
//---------------------------------------------------------------------------

Ignore the window VCL and rendering related stuff (I just wanted to pass whole source so you can see how it is used). The important is only the function mipmap_compute which converts your input data to 2 mipmaps. One is holding min values and the other max values.

The dynamic allocatins are not important the only important code chunk is marked with comment:

// process all resolutions

Where for each mipmap there is only single for loop without any expensive operations. If your platform is better with branchless code you can compute the min,max using in-build brunchless functions min,max. Something like:

// process all resolutions
for (k=1;mmsiz[k];k++)
    {
    // halve resolution
    for (i=0,j=0;i<mmsiz[k];i++)
        {
        a=mmdat0[k-1][j];      a0=a;
        a=mmdat1[k-1][j]; j++; a1=a;
        a=mmdat0[k-1][j];      a0=min(a0,a);
        a=mmdat1[k-1][j]; j++; a1=max(a1,a);
        mmdat0[k][i]=a0;
        mmdat1[k][i]=a1;
        }
    }

This can be further optimized simply by using pointer to actually selected mipmaps that will get rid of the [k] and [k-1] indexes allowing one less memory access per each element access.

// process all resolutions
for (k=1;mmsiz[k];k++)
    {
    // halve resolution
    int *p0=mmdat0[k-1];
    int *p1=mmdat1[k-1];
    int *q0=mmdat0[k];
    int *q1=mmdat1[k];
    for (i=0,j=0;i<mmsiz[k];i++)
        {
        a=p0[j];      a0=a;
        a=p1[j]; j++; a1=a;
        a=p0[j];      a0=min(a0,a);
        a=p1[j]; j++; a1=max(a1,a);
        q0[i]=a0;
        q1[i]=a1;
        }
    }

Now all you need is to bilinearly interpolate between 2 mipmaps to achieve your resolution, here small example for this:

// actually rescaled output
int out0[samples];      // min
int out1[samples];      // max
int outs=0;             // size
void resize(int n)  // compute out0[n],out1[n] from mipmaps
    {
    int i,*p0,*p1,*q0,*q1,pn,qn;
    int pc,qc,pd,qd,pi,qi;
    int a,a0,a1,b0,b1,bm,bd;
    for (i=0;mmsiz[i]>=n;i++);  // find smaller resolution
    pn=mmsiz[i];
    p0=mmdat0[i];
    p1=mmdat1[i]; i--;
    qn=mmsiz[i];                // bigger or equal resolution
    q0=mmdat0[i];
    q1=mmdat1[i]; outs=n;
    pc=0; pi=0;
    qc=0; qi=0;
    bm=n-pn; bd=qn-pn;
    for (i=0;i<n-1;i++)
        {
        // bilinear interpolation (3x linear)
        a0=q0[qi];
        a1=q0[qi+1];
        b1=a0+(((a1-a0)*qc)/n);
        a0=p0[pi];
        a1=p0[pi+1];
        b0=a0+(((a1-a0)*pc)/n);
        out0[i]=b0+(((b1-b0)*bm)/bd);           // /bd might be bitshift right by log2(bd)
        // bilinear interpolation (3x linear)
        a0=q1[qi];
        a1=q1[qi+1];
        b1=a0+(((a1-a0)*qc)/n);
        a0=p1[pi];
        a1=p1[pi+1];
        b0=a0+(((a1-a0)*pc)/n);
        out1[i]=b0+(((b1-b0)*bm)/bd);           // /bd might be bitshift right by log2(bd)
        // DDA increment indexes
        pc+=pn; while (pc>=n){ pi++; pc-=n; }   // pi = (i*pn)/n
        qc+=qn; while (qc>=n){ qi++; qc-=n; }   // qi = (i*qn)/n
        }
    out0[n-1]=q0[pn-1];
    out1[n-1]=q1[pn-1];
    }

Beware target size n must be less or equal to highest mipmap resolution...

This is how it looks (when I change the resolution manually with mouse wheel):

The choppyness is caused by GIF grabber ... the scaling is fast and seamless in real.

**rafael** · Answer 2 · 2022-11-06T13:12:09.767000

I had a similar problem, with 1.800.000 points of a waveform to draw on an 800 points screen. The zoom factor was 2000. If someone is interested, that's how I got awesome results :

Divide the very long list into 400 smaller lists
For each smaller list calculate the biggest difference, between the smaller and larger value in that list.
Plot 2 points per list, one at (offset + delta / 2) and one at (offset - delta / 2)

Results : from 453932 points to 800 points

Python code :

numberOfSmallerList = 400
small_list_len = int(len(big_list) / numberOfSmallerList)

finalPointsToPlot = []

for i in range(0, len(big_list), small_list_len):
    biggestDiff = max(big_list[i:i+small_list_len]) - 
    min(big_list[i:i+small_list_len])

    finalPointsToPlot.append(biggestDiff/2 + 100)
    finalPointsToPlot.append(100 - biggestDiff/2)


import matplotlib.pyplot as plt
plt.plot(finalPointsToPlot)
plt.show()

Correct way to draw zoomable audio waveform

There are 2 best solutions below

Related Questions in ALGORITHM

Related Questions in AUDIO

Related Questions in AUDIO-PROCESSING

Related Questions in AUDACITY

Trending Questions

Popular # Hahtags

Popular Questions