I trained a xgboost tree model and use m2cgen convert it to c code. The generated code has 200000+ lines and is all nested if else statements. It's not cache friendly and i want to speed up it.
double score(double *input, double *output)
{
double var0;
if (input[22] >= 1.229579)
{
if (input[22] >= 2.4437861)
{
if (input[22] >= 4.480254)
{
if (input[16] >= 0.31260872)
{
if (input[19] >= 0.08543753)
{
if (input[20] >= 2.8695097)
{
var0 = 0.0;
}
else
{
var0 = -0.03854123;
}
}
else
{
if (input[3] >= 1.1886833)
{
var0 = -0.003812895;
}
else
{
var0 = -0.02584283;
}
}
}
else ............
double var1;
if (input[1] >= 0.11052115)
{
if (input[12] >= -0.407885)
{
if (input[7] >= -0.022361094)
{
if (input[0] >= 0.52131283)
{
if (input[5] >= 2.6582391)
{
if (input[12] >= 3.761185)
{
var1 = -0.032110736;
}
else ............
return var0 + var1 + var2 + ......
}
I compile the code with "gcc -Ofast -mavx2 -mfma -shared -fPIC" option. It takes about 5 microeconds to make one sigle prediction. I tried add branch prediction using likely/unlikely, but it makes no improvements.