Implementing tflite quantized inference in python

13 Views Asked by At

I've been dealing with a challenge for couple of days, and I appreciate any help regarding that.

I've trained a 1d-CNN with tensorflow for time series classification. Then, I quantized the network with tflite in INT8 mode and it is working fine. For research purposes, I need to implement the exact quantized inference flow in python.

So, I started to read tflite codes, the main paper, and also watching the MIT HAN LAB's efficientML course which teaches the quantized inference procedure in detail.

I started by extracting the Scale and ZeroPoint for different layers and I sucessfully quantized the weights&biases (they are exactly the same as tflite quantized weights&biases). However, when I implemented the quantized inference code and process the time series, the results of activation in my Convolution Layer is slightly different that tflite Conv layer activation(MAE = 1.21, MSE= 4.91). This error propagates throught the layers and the final result is far wrong!

Here is the code I implemented:

M = all_tensors_details[10]["quantization_parameters"]["scales"][0] # M = (Sw * Sx) / Sy 
Zy = np.int8(all_tensors_details[10]["quantization_parameters"]["zero_points"][0])

Qy = []
mmin, mmax = getQuantizedRange(QUANTIZATION_BITWIDTH)
for i in range(FILTER_SIZE):
    Qbias = quantizedCnnBias[i] - np.int32(np.array(list(quantizedCnnKernels[i].values())).sum()) * inputZeroPoint
    conv = np.convolve(np.int32(inputSequence), np.int32(np.array(list(quantizedCnnKernels[i].values()))), mode='same')
    summ = conv + Qbias
    
    multiplier = (np.float32(cnnWeightScale[i]) * np.float32(inputScale) ) / M
    term = multiplier*(np.float32(summ)) 
    
    termAdded = addZy(term, Zy, mmin, mmax )
    Qy.append(np.int8(np.clip(termAdded, mmin, mmax)))

Qy is the convolution result.

I'd deeply appreciate it if anyone can point out what I'm missing?

0

There are 0 best solutions below