Skip to content

TensorFlow BERT-base

Overview

In this tutorial, we demonstrate how to compile and run inference with the Hugging Face BERT-base model for masked language modeling using the RBLN Python API.

Setup & Installation

Before you begin, ensure that your system environment is properly configured and that all required packages are installed. This includes:

Note

RBLN SDK is distributed as a .whl package. Please note that the RBLN compiler and runtime
require an RBLN Portal account.

Using RBLN Python API

Model Compilation

Import TFBertForMaskedLM from the transformers library, instantiate the TensorFlow BERT-base model, convert it into a tf.function, compile the model using the RBLN compiler, and save the compiled model to disk.

from transformers import TFBertForMaskedLM  
import tensorflow as tf  
import rebel  # RBLN Compiler  

# Instantiate the TensorFlow BERT-base model  
model = TFBertForMaskedLM.from_pretrained('bert-base-uncased')  
func = tf.function(  
    lambda input_ids, attention_mask, token_type_ids: model(  
        input_ids, attention_mask, token_type_ids  
    )  
)  

# Compile the model  
MAX_SEQ_LEN = 128  
input_info = [  
    ('input_ids', [1, MAX_SEQ_LEN], tf.int64),  
    ('attention_mask', [1, MAX_SEQ_LEN], tf.int64),  
    ('token_type_ids', [1, MAX_SEQ_LEN], tf.int64),  
]  
compiled_model = rebel.compile_from_tf_function(  
    func,  
    input_info,  
    npu='RBLN-CA12'  
)  

# Save the compiled model to disk  
compiled_model.save('bert_base.rbln')  

Model Inference and Inference

Tokenize the input text using BertTokenizer, load the compiled model via RBLN Runtime, run inference, and display the predicted masked word.

from transformers import BertTokenizer, pipeline  
import tensorflow as tf  
import rebel  # RBLN Runtime  

# Prepare the input  
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')  
text = 'The capital of Korea is [MASK].'  
MAX_SEQ_LEN = 128  
inputs = tokenizer(text, return_tensors='np', padding='max_length', max_length=MAX_SEQ_LEN)  

# Load the compiled model  
module = rebel.Runtime('bert_base.rbln')  
# Run inference  
out = module.run(**inputs)  

# Check results  
mask_token_index = tf.where((inputs.input_ids == tokenizer.mask_token_id)[0])  
selected_logits = tf.gather_nd(out[0], indices=mask_token_index)  
predicted_token_id = tf.math.argmax(selected_logits, axis=-1)  
print('Masked word is [', tokenizer.decode(predicted_token_id), '].')  

The results will look like this:

Masked word is [ Seoul ].

Summary and References

This tutorial demonstrated how to compile and run inference with the Hugging Face BERT-base model using TensorFlow and the RBLN Python API. The compiled model can be efficiently used for inference on an RBLN NPU for masked language modeling.

References: