Bucketing¶

Bucketing is a powerful feature in RBLN SDK that allows you to compile a single model to support multiple input shapes efficiently. Unlike typical approaches that require separate models for each input size, bucketing enables the rebel-compiler to create a unified runtime that can handle various input dimensions within a single compiled model.

What is Bucketing?¶

The rebel-compiler is fundamentally based on static graph compilation, which typically requires fixed input shapes at compile time. However, many real-world applications need to handle inputs of varying sizes - for example:

Variable batch sizes for batch processing optimization
Different image resolutions for computer vision tasks
Dynamic sequence lengths for natural language processing
Multi-scale inference for object detection

Bucketing addresses this challenge by allowing you to pre-define multiple input shapes during compilation. The resulting compiled model can then efficiently switch between these predefined shapes at runtime using a single rebel.Runtime instance.

How Bucketing Works¶

When you compile a model with bucketing:

Multiple Input Shapes: You specify several input configurations (buckets) that your model should support
Unified Compilation: The compiler generates optimized kernels for each bucket within a single compiled model
Runtime Selection: During inference, the runtime automatically selects the appropriate bucket based on the input shape
Efficient Switching: No model reloading is required when switching between different input shapes

Prerequisites¶

Before getting started, please make sure you have installed the following packages:

Basic Bucketing Example¶

Step 1: Prepare a Model¶

Let's start with a simple ResNet50 example that supports multiple batch sizes:

import torch
import torchvision.models as models
import rebel

# Load a pre-trained ResNet50 model
model = models.resnet50(pretrained=True).eval()

Step 2: Define Multiple Input Shapes (Buckets)¶

# Define the input shapes we want to support
image_size = 224
supported_batch_sizes = [1, 2, 4, 8]  # Different batch sizes
input_infos = []

# Create input information for each batch size
for batch_size in supported_batch_sizes:
    input_info = [("input", [batch_size, 3, image_size, image_size], "float32")]
    input_infos.append(input_info)

print("Defined buckets:")
for i, info in enumerate(input_infos):
    print(f"  Bucket {i}: {info[0][1]}")

Step 3: Compile the Model with Bucketing¶

# Compile the model with multiple input shapes
compiled_model = rebel.compile_from_torch(
    model,
    input_info=input_infos  # Pass list of input_info for bucketing
)

# Save the compiled model (optional)
compiled_model.save("resnet50_bucketed.rbln")

Step 4: Create Runtime and Perform Inference¶

# Create a single runtime that supports all buckets
runtime = rebel.Runtime(compiled_model, tensor_type="pt")

# Test with different batch sizes
test_batch_sizes = [1, 2, 4, 8]

for batch_size in test_batch_sizes:
    print(f"\nTesting with batch size: {batch_size}")

    # Create random input with the current batch size
    test_input = torch.randn(batch_size, 3, 224, 224)

    # Run inference - the runtime automatically selects the appropriate bucket
    output = runtime(test_input)

    print(f"  Input shape: {test_input.shape}")
    print(f"  Output shape: {output.shape}")
    print(f"  Predicted classes: {torch.argmax(output, axis=1)}")

Advanced Bucketing Examples¶

Variable Image Sizes¶

You can also create buckets for different image resolutions:

import torch
import torchvision.models as models
import rebel

# Load model
model = models.efficientnet_b0(pretrained=True).eval()

# Define multiple image sizes and batch sizes
configurations = [
    (1, 3, 224, 224),  # Standard ImageNet size
    (1, 3, 256, 256),  # Slightly larger
    (1, 3, 288, 288),  # Even larger
    (2, 3, 224, 224),  # Batch of 2 with standard size
    (4, 3, 224, 224),  # Batch of 4 with standard size
]

input_infos = []
for batch, channels, height, width in configurations:
    input_info = [("input", [batch, channels, height, width], "float32")]
    input_infos.append(input_info)

# Compile with all configurations
compiled_model = rebel.compile_from_torch(
    model,
    input_info=input_infos
)

runtime = rebel.Runtime(compiled_model, tensor_type="pt")

# Test with different input sizes
def test_inference(batch_size, height, width):
    # Create random input
    test_input = torch.randn(batch_size, 3, height, width)

    # Run inference
    output = runtime(test_input)
    print(f"Input: {test_input.shape} -> Output: {output.shape}")

# Test various configurations
test_inference(1, 224, 224)
test_inference(1, 256, 256)
test_inference(1, 288, 288)
test_inference(2, 224, 224)
test_inference(4, 224, 224)

Bucket Selection Strategy¶

Choose your buckets to balance flexibility and performance:

# Good: Powers of 2 for batch sizes
batch_sizes = [1, 2, 4, 8]

# Good: Common image sizes
image_sizes = [224, 256, 288, 320, 416, 640]

# Avoid: Too many buckets (increases compilation time and model size)
# batch_sizes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

Key considerations: - Choose common input resolutions that match your expected use cases - Start with fewer buckets and add more as needed - Consider the trade-off between flexibility and compilation time

Complete Example: Image Classification with Dynamic Batching¶

Here's a complete example that demonstrates bucketing with image classification:

image_classification_bucketing.py
#!/usr/bin/env python3
"""
Complete Image Classification Example with Bucketing

This example demonstrates how to create a bucketed image classifier that
supports multiple batch sizes efficiently.
"""

import torch
import torchvision.models as models

import rebel


def create_bucketed_classifier():
    """Create a bucketed image classifier"""

    # Load pre-trained ResNet50
    model = models.resnet50(pretrained=True).eval()

    # Define buckets for different batch sizes
    batch_sizes = [1, 2, 4]
    input_infos = []

    for batch_size in batch_sizes:
        input_info = [("input", [batch_size, 3, 224, 224], "float32")]
        input_infos.append(input_info)

    compiled_model = rebel.compile_from_torch(model, input_info=input_infos)

    return compiled_model


# Example usage
def main():
    # Create bucketed model
    compiled_model = create_bucketed_classifier()
    runtime = rebel.Runtime(compiled_model, tensor_type="pt")

    # Example with different batch sizes
    batch_sizes = [1, 2, 4]
    for i, batch_size in enumerate(batch_sizes):
        print(f"\nTest case {i + 1}: Batch size {batch_size}")

        dummy_input = torch.randn(batch_size, 3, 224, 224)
        outputs = runtime(dummy_input)
        predictions = torch.argmax(outputs, axis=1)
        print(f"Predictions: {predictions.tolist()}")


if __name__ == "__main__":
    main()

Conclusion¶

Bucketing is a powerful feature that enables efficient handling of variable input shapes in RBLN SDK. It provides:

Flexibility: Support multiple input shapes with a single runtime instance
Efficiency: Use one runtime instance for all supported configurations

By following the examples and strategies in this tutorial, you can effectively leverage bucketing to build more flexible and efficient inference pipelines with RBLN SDK.