I have multipole calls to:
predictions = self.inference_function(**{self.input_tensor_name: tf.constant(input_image, dtype=tf.float32)})[self.output_tensor_name].numpy()
Each call my RAM (not in GPU) is increased in small amount. After some times this throw me out of my script.
This is how I load my saved model:
trt_saved_model = tf.saved_model.load(model_path)
inference_function = trt_saved_model.signatures["serving_default"]
input_tensor_name = list(inference_function.structured_input_signature[1].keys())[0]
output_tensor_name = list(inference_function.structured_outputs.keys())[0]
Am I doing something wrong?