TensorFlow NLP BERT-base
¶
This tutorial introduces how to compile and deploy the natural language processing model BERT for a masked laguage modeling task provided by Hugging Face
. The model predicts the most probable word to fill in the masked word of the given sentence.
This tutorial consists of following steps:
- How to compile the TensorFlow
BERT-base
model and save the compiled one - How to deploy the compiled model in the runtime-based inference environment
Prerequisite¶
Before we begin, please make sure you have installed the following pip packages in your system:
Note
If you want to skip the details and quickly compile and deploy the models on RBLN NPU, you can directly jump to the summary section in this tutorial. The code summarized in this section includes all the necessary steps required to compile and deploy the model so it can be used as a quick starting point for your own project.
Step 1. How to compile¶
In this section, we will demonstrate how to compile the Hugging Face BERT-base model.
Prepare the model¶
To start, we will import the TFBertForMaskedLM
model from the transformers library and convert it to a tf.function
object.
Compile the model¶
Once the tf.function
is instantiated, we can simply compile it with rebel.compile_from_tf_function()
method.
If the NPU is installed on your host machine, you can omit the npu
argument in the rebel.compile_from_tf_function()
function. In this case, the function will automatically detect and use the installed NPU. However, if the NPU is not installed on your host machine, you need to specify the target NPU using the npu
argument to avoid any errors.
Currently, there are two supported NPU names: RBLN-CA02
, RBLN-CA12
. If you are unsure about the name of your target NPU, you can check it by running the rbln-stat
command in the shell on the host machine where the NPU is installed.
Save the compiled model¶
We can use the compiled_model.save()
method to save the compiled model to the disk.
Step 2. How to deploy¶
In this section, we will learn how to load the compiled model, run inference, and check results inferred from the model.
Prepare the input¶
First, we need to prepare the input data for the target task, masked language modeling. We will use the pre-trained BertTokenizer
from the transformers library to tokenize the input sequence.
Run inference¶
The RBLN Runtime module rebel.Runtime()
is used to load the compiled model. We can use the run()
method of the instantiated runtime module for running inference on the given sentnce. Additionally, the __call__
magic method can also be used to run inference.
You can see fundamental information of the runtime module, such as input/output shapes and the compiled model size, by using the print(module)
function.
Check results¶
To decode the final logits to text, we first locate the index of the masked token in the input sequence. Then, we select the corresponding logits from the output of the model. After that, we decode the answer corresponding to the token id with the highest score using tokenizer.
The results will look like:
Summary¶
The complete code for the model compilation is:
The complete code for deployment of the compiled model is: