Inferring BERT fill-mask with tflite

16 Views Asked by Muhammad Ikhwan Perwira At 20 March 2024 at 02:24

There is no error. But I'm just not sure if what am I doing is correct. I have model.tflite for task fill-mask with BERT. I expect I can get list of probabilities output in list of string. The model.tflite was generated by this hugging face docs

optimum-cli export tflite --model cahya/bert-base-indonesian-1.5G --sequence_length 128 cahya_bert_1G_tflite/

Loading model and tokenizer:

from transformers import BertTokenizer
import tensorflow as tf
import numpy as np

tokenizer = BertTokenizer.from_pretrained('./cahya_bert_1G_tflite')
interpreter = tf.lite.Interpreter(model_path="./cahya_bert_1G_tflite/model.tflite")
interpreter.allocate_tensors()

Encoding string input:

input_details = interpreter.get_input_details()
display(input_details)

input_string = "My Mom go to the [MASK]."
input_tokens = np.zeros(input_details[0]['shape'], dtype=np.int64)
encoded_input = tokenizer.encode(input_string, add_special_tokens=True)
input_tokens[:, :len(encoded_input)] = encoded_input
debug_np_arr(input_tokens)

'''
[{'name': 'model_attention_mask:0',
  'index': 0,
  'shape': array([  1, 128]),
  'shape_signature': array([  1, 128]),
  'dtype': numpy.int64,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'model_input_ids:0',
  'index': 1,
  'shape': array([  1, 128]),
  'shape_signature': array([  1, 128]),
  'dtype': numpy.int64,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}},
 {'name': 'model_token_type_ids:0',
  'index': 2,
  'shape': array([  1, 128]),
  'shape_signature': array([  1, 128]),
  'dtype': numpy.int64,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]

[[   3 3519 7816 3180 2801 1873    4   17    1    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0]]
(1, 128)
int64
'''

Inferring:

interpreter.set_tensor(input_details[0]['index'], input_tokens)
interpreter.invoke()

Decoding Output:

output_details = interpreter.get_output_details()
display(output_details)

output_data = interpreter.get_tensor(output_details[0]['index'])

debug_np_arr(output_data)

'''
[{'name': 'StatefulPartitionedCall:0',
  'index': 2152,
  'shape': array([    1,   128, 32000]),
  'shape_signature': array([    1,   128, 32000]),
  'dtype': numpy.float32,
  'quantization': (0.0, 0),
  'quantization_parameters': {'scales': array([], dtype=float32),
   'zero_points': array([], dtype=int32),
   'quantized_dimension': 0},
  'sparsity_parameters': {}}]

[[[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]]
(1, 128, 32000)
float32

Here is Github Copilot explanations about output_data array dimension:

The output_data array you have with shape (1, 128, 32000) is a three-dimensional array. Here's what each dimension could represent, based on common practices in machine learning and natural language processing:

The first dimension with size 1 typically represents the batch size. In this case, it seems like you're processing one item at a time (batch size is 1).

The second dimension with size 128 could represent the sequence length. In the context of natural language processing, this could be the maximum number of tokens (like words or subwords) that the model processes in a single input sequence.

The third dimension with size 32000 likely represents the size of the output vocabulary. This could mean that for each token in the input sequence, the model is outputting a probability distribution over 32000 possible output tokens.

So, I expect I can get output_string like this:

output_string = 'My Mom go to the Market.'

Original Q&A

Inferring BERT fill-mask with tflite

Loading model and tokenizer:

Encoding string input:

Inferring:

Decoding Output:

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in DEEP-LEARNING

Related Questions in NLP

Related Questions in BERT-LANGUAGE-MODEL

Trending Questions

Popular # Hahtags

Popular Questions