Accelerate AI with over 200 supported models, effortlessly.

Discover how to quickly deploy your AI models on Rebellions' NPU using RBLN SDK.
You can find detailed information on our compiler, runtime, model zoo, and serving frameworks.

Get Started with Frameworks

HuggingFace
PyTorch
TensorFlow

RBLN SDK supports transformer and diffuser models on HuggingFace, downloadable from the Optimum RBLN library. Deploy newest models like Llama3-8b, SDXL from Huggingface Hub.

๐Ÿ’ก Run HuggingFace models on Rebellions hardware.

  • Compilation and inference with HuggingFace models optimized for Rebellionsโ€™ hardware.
  • Efficient, developer-friendly API using RBLN Runtime.
  • Support of Llama and SDXL models with multi chips.

RBLN SDK supports PyTorch 2.0. Accelerate your PyTorch-trained NLP, speech, and vision models on Rebellionsโ€™ hardware.

๐Ÿ’ก RBLN SDK integrates PyTorch models.

  • Compilation of PyTorch models optimized for Rebellionsโ€™ hardware.
  • Efficient, developer-friendly API using RBLN Runtime.
  • Run Torch 2.0 models without pretuning and build a powerful serving pipeline.

RBLN SDK supports TensorFlow. Optimize inference for models like LLMs, ImageNet and YOLO.

๐Ÿ’ก RBLN SDK integrates TensorFlow models.

  • Inference with a multitude of pre-trained Keras Applications.
  • Efficient, developer-friendly API using RBLN Runtime.
  • Run TensorFlow without pretuning and build a powerful serving pipeline.

Featured Resources

Rebellions specializes in the development of AI accelerators optimized to facilitate efficient AI inference across various advanced applications in fields such as finance and cloud computing. Explore our latest documentation, tutorials, and webinars.

Rebellionsโ€™ Software Stack

Rebellions Software Stack supports our hardware to deliver maximum performance.

Machine Learning Framework

Machine Learning (ML) frameworks are essential tools in the development and deployment of AI models, including NLP, Vision, Speech, and Generative models. Currently, the most popular frameworks are TensorFlow, PyTorch, and Hugging Face, each offering unique features and capabilities that cater to different aspects of machine learning development and deployment.

Compiler

The RBLN Compiler transforms models into executable instructions for ATOMโ„ข. It comprises two main components: the Frontend Compiler and the Backend Compiler. The Frontend Compiler abstracts deep learning models into Intermediate Representations (IRs), optimizing them before handing them off to the Backend Compiler. The Backend Compiler further optimizes these IRs and produces the Command Stream, the Program Binary for the hardware to execute the tasks, and serialized weights.

Compute Library

The Compute Library includes a comprehensive suite of highly optimized low-level operations, which are essential for model inference. These low-level operations form the programmable components of the arithmetic logic units within the Neural Engines. The Compute Library prepares the Program Binary at the Compilerโ€™s command. The RBLN SDK supports low-level operations for both traditional Convolutional Neural Networks (CNNs) and state-of-the-art GenAI models. This includes hundreds of General Matrix Multiply (GEMM), normalization, and nonlinear activation functions. Thanks to the flexibility of the Neural Engines, the list of supported low-level operations continues to expand, enabling acceleration across a wide range of AI applications.

Runtime Module

The Runtime Module acts as the intermediary between the compiled model and the hardware, managing the actual execution of programs. It prepares executable instructions generated by the Compiler, manages data transfer between memory and the Neural Engines, and monitors performance to optimize the execution process.

Driver

The Driver, consisting of the Kernel-Mode Driver (KMD) and User-Mode Driver (UMD), provides efficient, safe, and flexible access to the hardware. The KMD allows the operating system to recognize the hardware and exposes APIs to the UMD. It also delivers the Command Stream from the Compiler stack to the device. The UMD, running in user space, intermediates between the application software and the hardware, managing their interactions.

Firmware

The Firmware is the lowest-level software component on ATOMโ„ข, serving as the final interface between software and hardware. It controls the tasks of the Command Processor, which orchestrates ATOMโ„ขโ€™s operations. Located on the SoC, the Command Processor manages the Command Stream (the actual AI workloads) across multiple layers of the memory architecture and monitors the hardwareโ€™s health status.

RBLN Backend Rebellions Hardware

Rebellionsโ€™ ATOMโ„ข is an AI accelerator engineered specifically for AI inference tasks with formidable capacity, manufactured on Samsungโ€™s advanced 5nm process. It delivers 32 Tera Floating Point Operations per Second (TFLOPS) for FP16 and 128 Trillion Operations Per Second (TOPS) for INT8, enhanced by eight Neural Engines and 64 MB of on-chip SRAM. With an intricate memory architecture engineered with unparalleled technical mastery, ATOMโ„ข is designed for high performance and peak efficiency.

Rebellionsโ€™ Software Stack

Rebellions Software Stack supports our hardware to deliver maximum performance.

TapClick for more details.
Machine Learning Frameworks
Compiler
Compute Library
Runtime Module
Driver
Firmware
RBLN Backend
Rebellions Hardware

Frequently Asked Questions

How do I get started with RBLN SDK?

To get started with RBLN SDK, download RBLN driver, compiler and Model Zoo in appropriate versions.

1. Install RBLN Driver.
2. Install RBLN Compiler.
3. Check whether RBLN SDK supports your desired model and check the code in Model Zoo.
4. Run the sample code to ensure there are no issues.

For a detailed installation guide, please refer to SDK Installation Guide.

Can I run Llama models on RBLN NPUs?

We support the widely used Llama3, along with other Llama-based models (Solar, EEVE, etc.). Since the performance can be affected depending on the size and version of the models, please check Sample Code of each model.

Where can I find code examples?

Rebellions Model Zoo provides code examples for different models that you can optimize and test in our RBLN online documentation. You can find the code examples for each model on the following pages under the Task column:

- HuggingFace
- Pytorch
- Tensorflow

I keep getting bugs while compiling the model.

While we run tests covering most conceivable situations before release, bugs still happen.
Send us a file of Debug Dump Binaries (DDBs) so that we can provide technical support. The DDB file is safely encrypted and includes all the compilation steps and error logs. Please refer to Trouble and contact our Technical Support Team or email us.

Do all Hugging Face models run on Rebellionsโ€™ NPU?

While we aim to support most SOTA models on Hugging Face, we canโ€™t guarantee that all of them will automatically run on Rebellions NPU.
Some models may require optimizations or improvements.
For any inquiries on models not listed on Optimum RBLN, please contact our Technical Support Team or email us.

Can I run RBLN SDK on C code base?

Yes. RBLN SDK provides a runtime interface bounded to C/C++, useful in environments where the Python Runtime is not supported or when applications need to achieve optimal execution time.

To use RBLN SDK in C/C++:

1. Update the APT repository
2. Install rbln sdk package

For more details, please refer to our documentation on C API.

Iโ€™m developing an AI service, and I want to run models on ATOM. Which parts of the pipeline are executed?

The pipeline for model inference while running models on ATOM generally looks like this:

1. Preparing the pretrained model to run
2. Compiling the model with RBLN Compiler and saving the model
3. Loading the model with RBLN Runtime and running inference
4. Result (API etc)

Please note that the pipeline can differ depending on each service architecture. You can use RBLN SDK to run pre-trained deep learning models.

Do you support Nvidia Triton Inference Server or OpenAI API?

Yes, RBLN SDK supports Triton Inference Server and OpenAI API,
so that developers can focus on deploying models that meet the needs of their applications.

Can I deploy models on multiple NPUs?

Yes. RBLN SDK supports configuration of multiple NPUs.
You can either use a single ATOM NPU to execute the model in parallel or divide the workload of a large model across multiple NPUs.
Currently, our Optimum RBLN supports this feature. For a list of supported models, please refer to Optimum RBLN.

Need help finding information?

Get
Started
Get started with our user-friendly RBLN SDK.
SDK
Docs
Discover best practices and explore our comprehensive APIs.
Developer
Support
Reach out to us with any inquiries.