TensorFlow BERT-base
Overview
In this tutorial, we demonstrate how to compile and run inference with the Hugging Face BERT-base model
for masked language modeling using the RBLN Python API
.
Setup & Installation
Before you begin, ensure that your system environment is properly configured and that all required packages are installed. This includes:
- System Requirements:
- Packages Requirements:
- Installation Command:
| pip install tensorflow transformers numpy
pip install --extra-index-url https://pypi.rbln.ai/simple/ rebel-compiler>=0.7.4
|
Note
RBLN SDK is distributed as a .whl
package. Please note that the RBLN compiler and runtime
require an RBLN Portal account.
Using RBLN Python API
Model Compilation
Import TFBertForMaskedLM from the transformers library, instantiate the TensorFlow BERT-base model,
convert it into a tf.function, compile the model using the RBLN compiler, and save the compiled model to disk.
| from transformers import TFBertForMaskedLM
import tensorflow as tf
import rebel # RBLN Compiler
# Instantiate the TensorFlow BERT-base model
model = TFBertForMaskedLM.from_pretrained('bert-base-uncased')
func = tf.function(
lambda input_ids, attention_mask, token_type_ids: model(
input_ids, attention_mask, token_type_ids
)
)
# Compile the model
MAX_SEQ_LEN = 128
input_info = [
('input_ids', [1, MAX_SEQ_LEN], tf.int64),
('attention_mask', [1, MAX_SEQ_LEN], tf.int64),
('token_type_ids', [1, MAX_SEQ_LEN], tf.int64),
]
compiled_model = rebel.compile_from_tf_function(
func,
input_info,
npu='RBLN-CA12'
)
# Save the compiled model to disk
compiled_model.save('bert_base.rbln')
|
Model Inference and Inference
Tokenize the input text using BertTokenizer, load the compiled model via RBLN Runtime, run inference,
and display the predicted masked word.
| from transformers import BertTokenizer, pipeline
import tensorflow as tf
import rebel # RBLN Runtime
# Prepare the input
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text = 'The capital of Korea is [MASK].'
MAX_SEQ_LEN = 128
inputs = tokenizer(text, return_tensors='np', padding='max_length', max_length=MAX_SEQ_LEN)
# Load the compiled model
module = rebel.Runtime('bert_base.rbln')
# Run inference
out = module.run(**inputs)
# Check results
mask_token_index = tf.where((inputs.input_ids == tokenizer.mask_token_id)[0])
selected_logits = tf.gather_nd(out[0], indices=mask_token_index)
predicted_token_id = tf.math.argmax(selected_logits, axis=-1)
print('Masked word is [', tokenizer.decode(predicted_token_id), '].')
|
The results will look like this:
| Masked word is [ Seoul ].
|
Summary and References
This tutorial demonstrated how to compile and run inference with the Hugging Face BERT-base model using TensorFlow
and the RBLN Python API
. The compiled model can be efficiently used for inference on an RBLN NPU for masked language modeling.
References: