TensorFlow BERT-base
Overview
In this tutorial, we demonstrate how to compile and run inference with the HuggingFace BERT-base model
for masked language modeling using the RBLN Python API
.
Setup & Installation
Before you begin, ensure that your system environment is properly configured and that all required packages are installed. This includes:
- System Requirements:
- Packages Requirements:
- Installation Command:
| pip install tensorflow transformers numpy
pip install --extra-index-url https://pypi.rbln.ai/simple/ rebel-compiler>=0.8.2
|
Using RBLN Python API
Model Compilation
Import TFBertForMaskedLM from the transformers library, instantiate the TensorFlow BERT-base model,
convert it into a tf.function, compile the model using the RBLN compiler, and save the compiled model to disk.
| from transformers import TFBertForMaskedLM
import tensorflow as tf
import rebel # RBLN Compiler
# Instantiate the TensorFlow BERT-base model
model = TFBertForMaskedLM.from_pretrained('bert-base-uncased')
func = tf.function(
lambda input_ids, attention_mask, token_type_ids: model(
input_ids, attention_mask, token_type_ids
)
)
# Compile the model
MAX_SEQ_LEN = 128
input_info = [
('input_ids', [1, MAX_SEQ_LEN], tf.int64),
('attention_mask', [1, MAX_SEQ_LEN], tf.int64),
('token_type_ids', [1, MAX_SEQ_LEN], tf.int64),
]
compiled_model = rebel.compile_from_tf_function(
func,
input_info,
npu='RBLN-CA12'
)
# Save the compiled model to disk
compiled_model.save('bert_base.rbln')
|
Model Inference and Inference
Tokenize the input text using BertTokenizer, load the compiled model via RBLN Runtime, run inference,
and display the predicted masked word.
| from transformers import BertTokenizer, pipeline
import tensorflow as tf
import rebel # RBLN Runtime
# Prepare the input
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text = 'The capital of Korea is [MASK].'
MAX_SEQ_LEN = 128
inputs = tokenizer(text, return_tensors='np', padding='max_length', max_length=MAX_SEQ_LEN)
# Load the compiled model
module = rebel.Runtime('bert_base.rbln')
# Run inference
out = module.run(**inputs)
# Check results
mask_token_index = tf.where((inputs.input_ids == tokenizer.mask_token_id)[0])
selected_logits = tf.gather_nd(out[0], indices=mask_token_index)
predicted_token_id = tf.math.argmax(selected_logits, axis=-1)
print('Masked word is [', tokenizer.decode(predicted_token_id), '].')
|
Example Output:
| Masked word is [ Seoul ].
|
References