There is no error. But I'm just not sure if what am I doing is correct. I have model.tflite for task fill-mask with BERT. I expect I can get list of probabilities output in list of string. The model.tflite was generated by this hugging face docs
optimum-cli export tflite --model cahya/bert-base-indonesian-1.5G --sequence_length 128 cahya_bert_1G_tflite/
Loading model and tokenizer:
from transformers import BertTokenizer
import tensorflow as tf
import numpy as np
tokenizer = BertTokenizer.from_pretrained('./cahya_bert_1G_tflite')
interpreter = tf.lite.Interpreter(model_path="./cahya_bert_1G_tflite/model.tflite")
interpreter.allocate_tensors()
Encoding string input:
input_details = interpreter.get_input_details()
display(input_details)
input_string = "My Mom go to the [MASK]."
input_tokens = np.zeros(input_details[0]['shape'], dtype=np.int64)
encoded_input = tokenizer.encode(input_string, add_special_tokens=True)
input_tokens[:, :len(encoded_input)] = encoded_input
debug_np_arr(input_tokens)
'''
[{'name': 'model_attention_mask:0',
'index': 0,
'shape': array([ 1, 128]),
'shape_signature': array([ 1, 128]),
'dtype': numpy.int64,
'quantization': (0.0, 0),
'quantization_parameters': {'scales': array([], dtype=float32),
'zero_points': array([], dtype=int32),
'quantized_dimension': 0},
'sparsity_parameters': {}},
{'name': 'model_input_ids:0',
'index': 1,
'shape': array([ 1, 128]),
'shape_signature': array([ 1, 128]),
'dtype': numpy.int64,
'quantization': (0.0, 0),
'quantization_parameters': {'scales': array([], dtype=float32),
'zero_points': array([], dtype=int32),
'quantized_dimension': 0},
'sparsity_parameters': {}},
{'name': 'model_token_type_ids:0',
'index': 2,
'shape': array([ 1, 128]),
'shape_signature': array([ 1, 128]),
'dtype': numpy.int64,
'quantization': (0.0, 0),
'quantization_parameters': {'scales': array([], dtype=float32),
'zero_points': array([], dtype=int32),
'quantized_dimension': 0},
'sparsity_parameters': {}}]
[[ 3 3519 7816 3180 2801 1873 4 17 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0]]
(1, 128)
int64
'''
Inferring:
interpreter.set_tensor(input_details[0]['index'], input_tokens)
interpreter.invoke()
Decoding Output:
output_details = interpreter.get_output_details()
display(output_details)
output_data = interpreter.get_tensor(output_details[0]['index'])
debug_np_arr(output_data)
'''
[{'name': 'StatefulPartitionedCall:0',
'index': 2152,
'shape': array([ 1, 128, 32000]),
'shape_signature': array([ 1, 128, 32000]),
'dtype': numpy.float32,
'quantization': (0.0, 0),
'quantization_parameters': {'scales': array([], dtype=float32),
'zero_points': array([], dtype=int32),
'quantized_dimension': 0},
'sparsity_parameters': {}}]
[[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]]
(1, 128, 32000)
float32
Here is Github Copilot explanations about output_data array dimension:
The output_data array you have with shape (1, 128, 32000) is a three-dimensional array. Here's what each dimension could represent, based on common practices in machine learning and natural language processing:
The first dimension with size 1 typically represents the batch size. In this case, it seems like you're processing one item at a time (batch size is 1).
The second dimension with size 128 could represent the sequence length. In the context of natural language processing, this could be the maximum number of tokens (like words or subwords) that the model processes in a single input sequence.
The third dimension with size 32000 likely represents the size of the output vocabulary. This could mean that for each token in the input sequence, the model is outputting a probability distribution over 32000 possible output tokens.
So, I expect I can get output_string like this:
output_string = 'My Mom go to the Market.'