Bucketing
Bucketing is a powerful feature in RBLN SDK that allows you to compile a single model to support multiple input shapes efficiently. Unlike typical approaches that require separate models for each input size, bucketing enables the rebel-compiler to create a unified runtime that can handle various input dimensions within a single compiled model.
What is Bucketing?
The rebel-compiler is fundamentally based on static graph compilation, which typically requires fixed input shapes at compile time. However, many real-world applications need to handle inputs of varying sizes - for example:
- Variable batch sizes for batch processing optimization
- Different image resolutions for computer vision tasks
- Dynamic sequence lengths for natural language processing
- Multi-scale inference for object detection
Bucketing addresses this challenge by allowing you to pre-define multiple input shapes during compilation. The resulting compiled model can then efficiently switch between these predefined shapes at runtime using a single rebel.Runtime
instance.
How Bucketing Works
When you compile a model with bucketing:
- Multiple Input Shapes: You specify several input configurations (buckets) that your model should support
- Unified Compilation: The compiler generates optimized kernels for each bucket within a single compiled model
- Runtime Selection: During inference, the runtime automatically selects the appropriate bucket based on the input shape
- Efficient Switching: No model reloading is required when switching between different input shapes
Prerequisites
Before getting started, please make sure you have installed the following packages:
Basic Bucketing Example
Step 1: Prepare a Model
Let's start with a simple ResNet50 example that supports multiple batch sizes:
| import torch
import torchvision.models as models
import rebel
# Load a pre-trained ResNet50 model
model = models.resnet50(pretrained=True).eval()
|
| # Define the input shapes we want to support
image_size = 224
supported_batch_sizes = [1, 2, 4, 8] # Different batch sizes
input_infos = []
# Create input information for each batch size
for batch_size in supported_batch_sizes:
input_info = [("input", [batch_size, 3, image_size, image_size], "float32")]
input_infos.append(input_info)
print("Defined buckets:")
for i, info in enumerate(input_infos):
print(f" Bucket {i}: {info[0][1]}")
|
Step 3: Compile the Model with Bucketing
| # Compile the model with multiple input shapes
compiled_model = rebel.compile_from_torch(
model,
input_info=input_infos # Pass list of input_info for bucketing
)
# Save the compiled model (optional)
compiled_model.save("resnet50_bucketed.rbln")
|
| # Create a single runtime that supports all buckets
runtime = rebel.Runtime(compiled_model, tensor_type="pt")
# Test with different batch sizes
test_batch_sizes = [1, 2, 4, 8]
for batch_size in test_batch_sizes:
print(f"\nTesting with batch size: {batch_size}")
# Create random input with the current batch size
test_input = torch.randn(batch_size, 3, 224, 224)
# Run inference - the runtime automatically selects the appropriate bucket
output = runtime(test_input)
print(f" Input shape: {test_input.shape}")
print(f" Output shape: {output.shape}")
print(f" Predicted classes: {torch.argmax(output, axis=1)}")
|
Advanced Bucketing Examples
Variable Image Sizes
You can also create buckets for different image resolutions:
| import torch
import torchvision.models as models
import rebel
# Load model
model = models.efficientnet_b0(pretrained=True).eval()
# Define multiple image sizes and batch sizes
configurations = [
(1, 3, 224, 224), # Standard ImageNet size
(1, 3, 256, 256), # Slightly larger
(1, 3, 288, 288), # Even larger
(2, 3, 224, 224), # Batch of 2 with standard size
(4, 3, 224, 224), # Batch of 4 with standard size
]
input_infos = []
for batch, channels, height, width in configurations:
input_info = [("input", [batch, channels, height, width], "float32")]
input_infos.append(input_info)
# Compile with all configurations
compiled_model = rebel.compile_from_torch(
model,
input_info=input_infos
)
runtime = rebel.Runtime(compiled_model, tensor_type="pt")
# Test with different input sizes
def test_inference(batch_size, height, width):
# Create random input
test_input = torch.randn(batch_size, 3, height, width)
# Run inference
output = runtime(test_input)
print(f"Input: {test_input.shape} -> Output: {output.shape}")
# Test various configurations
test_inference(1, 224, 224)
test_inference(1, 256, 256)
test_inference(1, 288, 288)
test_inference(2, 224, 224)
test_inference(4, 224, 224)
|
Bucket Selection Strategy
Choose your buckets to balance flexibility and performance:
| # Good: Powers of 2 for batch sizes
batch_sizes = [1, 2, 4, 8]
# Good: Common image sizes
image_sizes = [224, 256, 288, 320, 416, 640]
# Avoid: Too many buckets (increases compilation time and model size)
# batch_sizes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
|
Key considerations:
- Choose common input resolutions that match your expected use cases
- Start with fewer buckets and add more as needed
- Consider the trade-off between flexibility and compilation time
Complete Example: Image Classification with Dynamic Batching
Here's a complete example that demonstrates bucketing with image classification:
image_classification_bucketing.py |
---|
| #!/usr/bin/env python3
"""
Complete Image Classification Example with Bucketing
This example demonstrates how to create a bucketed image classifier that
supports multiple batch sizes efficiently.
"""
import torch
import torchvision.models as models
import rebel
def create_bucketed_classifier():
"""Create a bucketed image classifier"""
# Load pre-trained ResNet50
model = models.resnet50(pretrained=True).eval()
# Define buckets for different batch sizes
batch_sizes = [1, 2, 4]
input_infos = []
for batch_size in batch_sizes:
input_info = [("input", [batch_size, 3, 224, 224], "float32")]
input_infos.append(input_info)
compiled_model = rebel.compile_from_torch(model, input_info=input_infos)
return compiled_model
# Example usage
def main():
# Create bucketed model
compiled_model = create_bucketed_classifier()
runtime = rebel.Runtime(compiled_model, tensor_type="pt")
# Example with different batch sizes
batch_sizes = [1, 2, 4]
for i, batch_size in enumerate(batch_sizes):
print(f"\nTest case {i + 1}: Batch size {batch_size}")
dummy_input = torch.randn(batch_size, 3, 224, 224)
outputs = runtime(dummy_input)
predictions = torch.argmax(outputs, axis=1)
print(f"Predictions: {predictions.tolist()}")
if __name__ == "__main__":
main()
|
Conclusion
Bucketing is a powerful feature that enables efficient handling of variable input shapes in RBLN SDK. It provides:
- Flexibility: Support multiple input shapes with a single runtime instance
- Efficiency: Use one runtime instance for all supported configurations
By following the examples and strategies in this tutorial, you can effectively leverage bucketing to build more flexible and efficient inference pipelines with RBLN SDK.