I wanted to convert my trained model for better inference performance, by using TF-TRT. I used the nvidia tensorflow docker image, and had no problem with running test code.
Test code is from here: https://github.com/jhson989/tf-to-trt
and Detail Docker Image tag: nvcr.io/nvidia/tensorflow:23.12-tf2-py3
But when I tried to convert my trained model, it didn't work.
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.compiler.tensorrt import trt_convert as trt
# The trained model is .h5 format
h5_model_path = 'model/path/h5/model_name'
h5_model = keras.models.load_model(model_path, compile=False)
# Need to convert .h5 to saved_model format for using TF-TRT
saved_model_path = 'model/path/saved_model/model_name'
tf.saved_model.save(h5_model, saved_model_path)
# Make a Converter
conversion_param = trt.TrtConversionParams(precision_mode=trt.TrtPrecisionMode.FP16)
converter = trt.TrtGraphConverterV2(input_saved_model_dir=saved_model_path, conversion_params=conversion_param)
# Error occurs from here
converter.convert()
And this error occurred.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 92, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern))
RuntimeError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /model/path/saved_model/model_name/variables/variables
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 1031, in load_partial
loader = Loader(object_graph_proto, saved_model_proto, export_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 226, in __init__
self._restore_checkpoint()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 561, in _restore_checkpoint
load_status = saver.restore(variables_path, self._checkpoint_options)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/checkpoint/checkpoint.py", line 1415, in restore
reader = py_checkpoint_reader.NewCheckpointReader(save_path)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 96, in NewCheckpointReader
error_translator(e)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 31, in error_translator
raise errors_impl.NotFoundError(None, None, error_message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /model/path/saved_model/model_name/variables/variables
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/model/code/convert_model.py", line 106, in eval
converter.convert()
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 1453, in convert
self._saved_model = load.load(self._input_saved_model_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 900, in load
result = load_partial(export_dir, None, tags, options)["root"]
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 1034, in load_partial
raise FileNotFoundError(
FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /model/path/saved_model/model_name/variables/variables
You may be trying to load on a different device from the computational device. Consider setting the `experimental_io_device` option in `tf.saved_model.LoadOptions` to the io_device such as '/job:localhost'.
I already confirmed the saved_model version of my model has same directory inside with test code. Specifically '/model/path/saved_model/model_name/variables' directory, with variables.data-00000-of-00001 and variablevariables.index.
I solved this problem by changing the model's name.
I used my model name which consists of the model's result(e.g. val_acc, val_loss...) as saved_model format model's name.
I don't know what exactly happened.
But when I change the model's name to f'save_{idx}' (or something simple.), it works.