TensorFlow BERT-base¶

Overview¶

In this tutorial, we demonstrate how to compile and run inference with the HuggingFace BERT-base model for masked language modeling using the RBLN Python API.

Setup & Installation¶

Before you begin, ensure that your system environment is properly configured and that all required packages are installed. This includes:

System Requirements:
- Python: 3.9–3.12
- RBLN Driver
Packages Requirements:

Installation Command:

pip install tensorflow transformers numpy  
pip install --extra-index-url https://pypi.rbln.ai/simple/ rebel-compiler>=0.8.2

Note

Please note that rebel-compiler requires an RBLN Portal account.

Using `RBLN Python API`¶

Model Compilation¶

Import TFBertForMaskedLM from the transformers library, instantiate the TensorFlow BERT-base model, convert it into a tf.function, compile the model using the RBLN compiler, and save the compiled model to disk.

from transformers import TFBertForMaskedLM  
import tensorflow as tf  
import rebel  # RBLN Compiler  

# Instantiate the TensorFlow BERT-base model  
model = TFBertForMaskedLM.from_pretrained('bert-base-uncased')  
func = tf.function(  
    lambda input_ids, attention_mask, token_type_ids: model(  
        input_ids, attention_mask, token_type_ids  
    )  
)  

# Compile the model  
MAX_SEQ_LEN = 128  
input_info = [  
    ('input_ids', [1, MAX_SEQ_LEN], tf.int64),  
    ('attention_mask', [1, MAX_SEQ_LEN], tf.int64),  
    ('token_type_ids', [1, MAX_SEQ_LEN], tf.int64),  
]  
compiled_model = rebel.compile_from_tf_function(  
    func,  
    input_info,  
    npu='RBLN-CA12'  
)  

# Save the compiled model to disk  
compiled_model.save('bert_base.rbln')  

Model Inference and Inference¶

Tokenize the input text using BertTokenizer, load the compiled model via RBLN Runtime, run inference, and display the predicted masked word.

from transformers import BertTokenizer, pipeline  
import tensorflow as tf  
import rebel  # RBLN Runtime  

# Prepare the input  
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')  
text = 'The capital of Korea is [MASK].'  
MAX_SEQ_LEN = 128  
inputs = tokenizer(text, return_tensors='np', padding='max_length', max_length=MAX_SEQ_LEN)  

# Load the compiled model  
module = rebel.Runtime('bert_base.rbln')  
# Run inference  
out = module.run(**inputs)  

# Check results  
mask_token_index = tf.where((inputs.input_ids == tokenizer.mask_token_id)[0])  
selected_logits = tf.gather_nd(out[0], indices=mask_token_index)  
predicted_token_id = tf.math.argmax(selected_logits, axis=-1)  
print('Masked word is [', tokenizer.decode(predicted_token_id), '].')  

Example Output:

Masked word is [ Seoul ].