Accelerate AI with Over 300 Supported Models, Effortlessly.

Discover how to quickly deploy your AI models on Rebellions' NPU using RBLN SDK.
You can find detailed information on our compiler, runtime, model zoo, and serving frameworks.

Get Started with Frameworks

Hugging Face

RBLN SDK supports transformer and diffuser models on Hugging Face, downloadable from the Optimum RBLN library. Deploy newest models like Llama3-8b, SDXL from Hugging Face Hub.

💡 Run Hugging Face models on Rebellions hardware.

  • Compilation and inference with Hugging Face models optimized for Rebellions’ hardware.
  • Efficient, developer-friendly API using RBLN Runtime.
  • Support of Llama and SDXL models with multi chips.

PyTorch

RBLN SDK supports PyTorch 2.0. Accelerate your PyTorch-trained NLP, speech, and vision models on Rebellions’ hardware.

💡 RBLN SDK integrates PyTorch models.

  • Compilation of PyTorch models optimized for Rebellions’ hardware.
  • Efficient, developer-friendly API using RBLN Runtime.
  • Run Torch 2.0 models without pretuning and build a powerful serving pipeline.

TensorFlow

RBLN SDK supports TensorFlow. Optimize inference for models like LLMs, ImageNet and YOLO.

💡 RBLN SDK integrates TensorFlow models.

  • Inference with a multitude of pre-trained Keras Applications.
  • Efficient, developer-friendly API using RBLN Runtime.
  • Run TensorFlow without pretuning and build a powerful serving pipeline.

Rebellions’ Software Stack

Rebellions Software Stack supports our hardware to deliver maximum performance.

Machine Learning Framework

Machine Learning (ML) frameworks are essential tools in the development and deployment of AI models, including NLP, Vision, Speech, and Generative models. Currently, the most popular frameworks are TensorFlow, PyTorch, and Hugging Face, each offering unique features and capabilities that cater to different aspects of machine learning development and deployment.

Compiler

The RBLN Compiler transforms models into executable instructions for ATOM™. It comprises two main components: the Frontend Compiler and the Backend Compiler. The Frontend Compiler abstracts deep learning models into Intermediate Representations (IRs), optimizing them before handing them off to the Backend Compiler. The Backend Compiler further optimizes these IRs and produces the Command Stream, the Program Binary for the hardware to execute the tasks, and serialized weights.

Compiler

Compute Library

The Compute Library includes a comprehensive suite of highly optimized low-level operations, which are essential for model inference. These low-level operations form the programmable components of the arithmetic logic units within the Neural Engines. The Compute Library prepares the Program Binary at the Compiler’s command. The RBLN SDK supports low-level operations for both traditional Convolutional Neural Networks (CNNs) and state-of-the-art GenAI models. This includes hundreds of General Matrix Multiply (GEMM), normalization, and nonlinear activation functions. Thanks to the flexibility of the Neural Engines, the list of supported low-level operations continues to expand, enabling acceleration across a wide range of AI applications.

Compute Library

Runtime Module

The Runtime Module acts as the intermediary between the compiled model and the hardware, managing the actual execution of programs. It prepares executable instructions generated by the Compiler, manages data transfer between memory and the Neural Engines, and monitors performance to optimize the execution process.

Runtime Module

Driver

The Driver, consisting of the Kernel-Mode Driver (KMD) and User-Mode Driver (UMD), provides efficient, safe, and flexible access to the hardware. The KMD allows the operating system to recognize the hardware and exposes APIs to the UMD. It also delivers the Command Stream from the Compiler stack to the device. The UMD, running in user space, intermediates between the application software and the hardware, managing their interactions.

Driver

Firmware

The Firmware is the lowest-level software component on ATOM™, serving as the final interface between software and hardware. It controls the tasks of the Command Processor, which orchestrates ATOM™’s operations. Located on the SoC, the Command Processor manages the Command Stream (the actual AI workloads) across multiple layers of the memory architecture and monitors the hardware’s health status.

Firmware

RBLN Backend Rebellions Hardware

Rebellions’ ATOM™ is an AI accelerator engineered specifically for AI inference tasks with formidable capacity, manufactured on Samsung’s advanced 5nm process. It delivers 32 Tera Floating Point Operations per Second (TFLOPS) for FP16 and 128 Trillion Operations Per Second (TOPS) for INT8, enhanced by eight Neural Engines and 64 MB of on-chip SRAM. With an intricate memory architecture engineered with unparalleled technical mastery, ATOM™ is designed for high performance and peak efficiency.

Machine Learning Framework
Compiler
Compute Library
Runtime Module
Driver
Firmware
RBLN Backend Rebellions Hardware

Frequently Asked Questions

Can’t find what you’re looking for? Contact us here!

Q. Which AI frameworks and libraries does RBLN SDK support?
A.
RBLN SDK supports models based on PyTorch and TensorFlow and is also compatible with the Hugging Face Transformers/Diffusers libraries.

We are continuously improving compatibility with major AI frameworks through regular updates.
Q. Can I compile PyTorch or TensorFlow models with RBLN SDK without code modifications?
A.

In most cases, you can use the RBLN SDK with minimal code changes.


  • For officially supported Model Zoo models, you can use the provided example code right away.
  • Other models can also be compiled by referring to the Model Zoo code.

Check the list of supported operations in advance:

Q. How do I ensure version compatibility with AI frameworks?
A.

To maximize the performance of transformer-based models, consider the following:


  • Set the rbln_tensor_parallel_size value appropriately to utilize NPU parallelism
  • Tune the input sequence length and batch size
Q. Does RBLN Runtime API support C/C++?
A.

The RBLN SDK provides a C/C++-bound runtime for applications where Python runtime is unavailable or extremely low latency is required.
Please refer to the C/C++ guide for more information.

Q. How do I ensure version compatibility with AI frameworks?
A.

The RBLN SDK and Compiler are regularly updated to maintain API compatibility with the latest versions of major frameworks.
For details, please refer to the respective Release Notes.

Q. Is RBLN SDK compatible with PyTorch?
A.

RBLN SDK offers high compatibility with PyTorch-based models.


torch.compile() Support: Fully compatible with PyTorch 2.0’s torch.compile() feature, and supports models compiled using TorchDynamo and TorchInductor backends.


• Extensive Operator Support: The RBLN Compiler supports most PyTorch operators. You can check the full list in Supported Ops. It also includes major operators for Vision, NLP, and Audio, making it suitable for a wide range of deep learning models.


• PyTorch Model Zoo Compatibility: Popular models such as ResNet, YOLO, LLaMA, and BERT are supported. See the PyTorch Model Zoo page for more details.


• JIT/Scripted Model Support: Models converted using TorchScript can also be processed by the RBLN Compiler.

Q. How do I install RBLN Driver?
A.

The RBLN Driver can be installed using the provided deb or rpm installation files and requires root privileges. During installation, you must ensure that the kernel version is compatible with the driver.


In most cases, we provide an environment with the Driver pre-installed. If installation is required, please refer to the Installation Guide.

Q. How do I install RBLN SDK?
A.

The RBLN SDK can be easily installed in a Python environment as follows:


pip3 install --extra-index-url https://pypi.rbln.ai/simple rebel-compiler==<latest-version> optimum-rbln==<latest-version> vllm-rbln==<latest-version>

To check the latest package versions, refer to the Release Notes. Depending on your environment, additional Python package dependencies may be required.


pip3 install --extra-index-url https://pypi.rbln.ai/simple rebel-compiler== optimum-rbln== vllm-rbln==
Q. What is the required Python version and are there additional dependencies?
A.

Python 3.9 or higher is recommended, and there are key package dependencies such as numpy, torch, and onnx.


Please refer to the Support Matrix page for the supported OS and Python versions.
Required packages may vary by model, so refer to the requirements.txt file included in the Model Zoo code for details.

Q. Does RBLN SDK support Windows?
A.

Currently, RBLN SDK only supports Linux. Windows support will be determined based on our technical roadmap.


More details on the supported OS and Python version can be found on the Support Matrix page.

Q. Can I run inference on multiple devices?
A.

The RBLN SDK supports distributed inference based on tensor parallelism, called RSD (Rebellions Scalable Design).
Please first check the Model List that support multi-device, and refer to the provided example for compilation instructions.

Q. Can I measure and analyze model performance?
A.

You can analyze metrics such as latency, throughput, and memory usage using the Profiler included in the SDK.


With rbln-stat, you can also monitor power consumption and utilization.

Q. How do I determine the optimal batch size?
A.

The optimal batch size may vary depending on the type of NPU used, server configuration, and service requirements.
We recommend using the Profiler tool and conducting various experiments for fine-tuning.

Q. Are there profiling and optimization tools?
A.

RBLN SDK includes the RBLN Profiler for performance bottleneck analysis, collecting key metrics such as execution time, memory usage, and operation dependencies


  • .pb format trace files can be visualized with Perfetto. - You can analyze bottlenecks, inter-operation dependencies, and layer-by-layer latency to suggest optimization directions. For detailed usage, refer to the Profiler Guide.
Q. How do I process video input files (.mp4)?
A.

To process video files, you can use libraries like OpenCV (cv2) to extract each frame from an .mp4 file as an image, and then feed those frames into the model for inference.


For example, when using an object detection model like YOLOX, the typical procedure is as follows:


1.Load the video file using cv2.VideoCapture
2.Extract frames one by one
3.Preprocess each frame to match the model’s input format
4.Perform object detection using the model
5.Visualize the results and either save them or display them in real time

Q. Which FP16 formats does RBLN SDK support?
A.
RBLN SDK supports BFloat16, IEEE 754 FP16, and custom FP16 formats. FP32 models can be automatically cast to FP16 during compilation with the RBLN Compiler.
Q. How are ATOM and REBEL different?
A.

Both are AI inference NPUs developed by Rebellions, but REBEL is a next-generation product designed with a chiplet-based architecture. A detailed comparison chart is available on the product page.

Q. Can I train models with RBLN SDK?
A.
The current RBLN SDK is designed for inference-only use. Plans for training support will be announced through the roadmap once they are finalized.
Q. Do you support Kubernetes?
A.

Yes. You can use Rebellions AI processor resources via the Kubernetes Plugin.

Q. What Kubernetes tools are available?
A.
  • Kubernetes Device Plugin: Supports RBLN NPUs on Kubernetes cluster environment.
  • NPU NPU Feature Discovery: Labels Kubernetes nodes with RBLN NPUs for scheduling.
  • RBLN Metrics Exporter: Exposes NPU metrics (temperature, power, DRAM, utilization) in Prometheus format for Grafana dashboards.
Q. Which NPUs are officially supported by the RBLN SDK?
A.
As of May 30, 2025, the SDK supports ATOM™+ (RBLN-CA22) and ATOM™-Max (RBLN-CA25). Support for ATOM™ (RBLN-CA02) ended on June 30, 2025.
Q. Do you support V1 Engine?
A.

The V1 Engine improves generation and multimodal models. Enable it by setting:


export VLLM_USE_V1=1
Q. Which serving frameworks do you support?
A.

RBLN SDK is compatible with vLLM, Nvidia Triton Inference Server, and TorchServe. Container-based deployment also supports integration with Kubernetes.

Q. How are NPUs GPUs different
A.
While both NPUs (Neural Processing Units) and GPUs (Graphics Processing Units) perform parallel computations, they differ in their optimized computation methods and intended use cases.

GPUs were originally designed for graphics rendering but have been widely adopted for AI training and high-performance computing (HPC) due to their large-scale parallel processing capabilities. They typically use FP32/FP16 operations and support various types of computation through CUDA cores and Tensor Cores.

NPUs are processors specialized for AI and deep learning, designed to perform efficient computations at low power. They are optimized for low-bit operations such as INT8 and FP16 and include dedicated hardware architectures that accelerate neural network computations.
Q. How can I fine-tune models or optimize inference?
A.

Rebellions devices are designed exclusively for inference, and fine-tuning is not currently supported.


To maximize inference performance, we recommend the following optimization strategies:


  • Use Mixed Precision and Quantization: Improve memory efficiency and compute speed by using FP16 or INT8 quantized models.
  • Adjust Batch Size: Find the optimal batch size based on model characteristics and input data to increase throughput.
  • Refactor Model Architecture: Simplify the computation graph through layer fusion and removal of redundant operations to boost performance.
  • Double Buffering: Utilize double buffering in AsyncRuntime to improve execution efficiency.
  • Apply Continuous Batching for LLM Serving: For large language model (LLM) serving, maximize hardware utilization by applying continuous batching techniques using vllm-rbln.
Q. Is there a forum or a support channel?
A.

You can ask questions or discuss technical issues on Rebellions Dev Forum. You can directly reach out to us here.

Q. How often are the firmware and driver updated?
A.

The SDK is updated approximately every month, and the driver is updated every three months, although the schedule is subject to change.
For detailed information, please refer to the latest Release Notes.

Q. Model compilation fails.
A.

Currently, for officially supported models listed in the RBLN Model Zoo, you can use the provided compilation and inference example code.


If you’re using a modified model or a model not included in the Model Zoo, technical support may be limited, and compilation may fail.
First, check the error code to identify the cause. If further assistance is required, please reach out via the Rebellions Dev Forum.

Q. I get errors during language model compilation and inference.
A.

Please check the following items:



  • Memory Usage: If the system runs out of memory during compilation, the process may fail.
  • NPU Configuration: Ensure that the value of rbln_tensor_parallel_size is not greater than the actual number of devices installed in your system. You can verify the number of devices by running the rbln-stat command in your terminal.
  • Docker Environment: Refer to the Docker Guide for more details.
Q. CPU usage is too high during model inference.
A.

You can limit the number of CPU threads used during inference by setting the RBLN_NUM_THREADS environment variable. Specifying an appropriate number of threads can reduce CPU load and help stabilize performance.


Pleae refer to this document for more details.

Q. I get errors after driver/compiler updates.
A.

Issues may arise due to version mismatches between the driver and compiler.


  • Refer to the Release Notes of the RBLN SDK to ensure that all components are installed with compatible versions.
  • After aligning all libraries to their compatible versions, try recompiling the model.