I'm currently optimizing a TensorFlow Lite model for an iOS app and aiming to leverage XNNPACK for improved performance. Despite configuring my build to enable XNNPACK, I'm uncertain whether it's actually being utilized, as my model's performance remains slower than expected.
My tensorflow version: 2.15.0
Here's the Bazel build command I used:
bazelisk build \
--config=ios_fat \
--ios_multi_cpus=armv7,arm64 \
-c opt \
--define tflite_with_xnnpack=true \
--define xnnpack_force_float_precision=fp16 \
--define tflite_with_xnnpack_qs8=true \
--define tflite_with_xnnpack_qu8=true \
--define tflite_with_xnnpack_transient_indirection_buffer=true \
//tensorflow/lite/ios:TensorFlowLiteC_framework
Questions:
- XNNPACK Verification: Is there a definitive method or specific log statement that can confirm XNNPACK's involvement in processing my TensorFlow Lite model?
- Understanding
--define tflite_with_xnnpack=true: What exactly does the--define tflite_with_xnnpack=trueflag enable within the TensorFlow Lite source code? I've noticed conditional compilation flags related to XNNPACK (e.g.,TFLITE_BUILD_WITH_XNNPACK_DELEGATE), but I'm seeking clarity on how this flag influences the build and execution.
Any insights or suggestions on how to accurately determine the use of XNNPACK and understand the impact of build flags on TensorFlow Lite's operation would be greatly appreciated.
Efforts to Diagnose:
In tensorflow lite source code
To determine if XNNPACK is in use, I added a log statement in the tensorflow/lite/kernels/conv.cc file:
template <KernelType kernel_type>
TfLiteStatus Eval(TfLiteContext* context, TfLiteNode* node) {
TFLITE_LOG_PROD(TFLITE_LOG_INFO, "Conv eval"); // Log added here
...
}
Upon running my app, this log statement prints continuously, suggesting that the convolution operation is being executed. Does it conclusively indicate whether XNNPACK is the backend performing these operations?
In my app
impl->model = TfLiteModelCreateFromFile(p_file);
impl->options = TfLiteInterpreterOptionsCreate();
TfLiteInterpreterOptionsSetNumThreads(impl->options, 4);
impl->interpreter = TfLiteInterpreterCreate(impl->model, impl->options);
TfLiteInterpreterAllocateTensors(impl->interpreter);