Developers
Accelerate AI with over 200 supported models, effortlessly.
Discover how to quickly deploy your AI models on Rebellions' NPU using RBLN SDK.
You can find detailed information on our compiler, runtime, model zoo, and serving frameworks.
Get Started with Frameworks
RBLN SDK supports transformer and diffuser models on HuggingFace, downloadable from the Optimum RBLN library. Deploy newest models like Llama3-8b, SDXL from Huggingface Hub.
๐ก Run HuggingFace models on Rebellions hardware.
- Compilation and inference with HuggingFace models optimized for Rebellionsโ hardware.
- Efficient, developer-friendly API using RBLN Runtime.
- Support of Llama and SDXL models with multi chips.
RBLN SDK supports PyTorch 2.0. Accelerate your PyTorch-trained NLP, speech, and vision models on Rebellionsโ hardware.
๐ก RBLN SDK integrates PyTorch models.
- Compilation of PyTorch models optimized for Rebellionsโ hardware.
- Efficient, developer-friendly API using RBLN Runtime.
- Run Torch 2.0 models without pretuning and build a powerful serving pipeline.
RBLN SDK supports TensorFlow. Optimize inference for models like LLMs, ImageNet and YOLO.
๐ก RBLN SDK integrates TensorFlow models.
- Inference with a multitude of pre-trained Keras Applications.
- Efficient, developer-friendly API using RBLN Runtime.
- Run TensorFlow without pretuning and build a powerful serving pipeline.
Featured Resources
Rebellions specializes in the development of AI accelerators optimized to facilitate efficient AI inference across various advanced applications in fields such as finance and cloud computing. Explore our latest documentation, tutorials, and webinars.
Rebellionsโ Software Stack
Rebellions Software Stack supports our hardware to deliver maximum performance.
Machine Learning Framework
Machine Learning (ML) frameworks are essential tools in the development and deployment of AI models, including NLP, Vision, Speech, and Generative models. Currently, the most popular frameworks are TensorFlow, PyTorch, and Hugging Face, each offering unique features and capabilities that cater to different aspects of machine learning development and deployment.
Compiler
The RBLN Compiler transforms models into executable instructions for ATOMโข. It comprises two main components: the Frontend Compiler and the Backend Compiler. The Frontend Compiler abstracts deep learning models into Intermediate Representations (IRs), optimizing them before handing them off to the Backend Compiler. The Backend Compiler further optimizes these IRs and produces the Command Stream, the Program Binary for the hardware to execute the tasks, and serialized weights.
Compute Library
The Compute Library includes a comprehensive suite of highly optimized low-level operations, which are essential for model inference. These low-level operations form the programmable components of the arithmetic logic units within the Neural Engines. The Compute Library prepares the Program Binary at the Compilerโs command. The RBLN SDK supports low-level operations for both traditional Convolutional Neural Networks (CNNs) and state-of-the-art GenAI models. This includes hundreds of General Matrix Multiply (GEMM), normalization, and nonlinear activation functions. Thanks to the flexibility of the Neural Engines, the list of supported low-level operations continues to expand, enabling acceleration across a wide range of AI applications.
Runtime Module
The Runtime Module acts as the intermediary between the compiled model and the hardware, managing the actual execution of programs. It prepares executable instructions generated by the Compiler, manages data transfer between memory and the Neural Engines, and monitors performance to optimize the execution process.
Driver
The Driver, consisting of the Kernel-Mode Driver (KMD) and User-Mode Driver (UMD), provides efficient, safe, and flexible access to the hardware. The KMD allows the operating system to recognize the hardware and exposes APIs to the UMD. It also delivers the Command Stream from the Compiler stack to the device. The UMD, running in user space, intermediates between the application software and the hardware, managing their interactions.
Firmware
The Firmware is the lowest-level software component on ATOMโข, serving as the final interface between software and hardware. It controls the tasks of the Command Processor, which orchestrates ATOMโขโs operations. Located on the SoC, the Command Processor manages the Command Stream (the actual AI workloads) across multiple layers of the memory architecture and monitors the hardwareโs health status.
RBLN Backend Rebellions Hardware
Rebellionsโ ATOMโข is an AI accelerator engineered specifically for AI inference tasks with formidable capacity, manufactured on Samsungโs advanced 5nm process. It delivers 32 Tera Floating Point Operations per Second (TFLOPS) for FP16 and 128 Trillion Operations Per Second (TOPS) for INT8, enhanced by eight Neural Engines and 64 MB of on-chip SRAM. With an intricate memory architecture engineered with unparalleled technical mastery, ATOMโข is designed for high performance and peak efficiency.
Rebellionsโ Software Stack
Rebellions Software Stack supports our hardware to deliver maximum performance.
Rebellions Hardware
Frequently Asked Questions
To get started with RBLN SDK, download RBLN driver, compiler and Model Zoo in appropriate versions.
1. Install RBLN Driver.
2. Install RBLN Compiler.
3. Check whether RBLN SDK supports your desired model and check the code in Model Zoo.
4. Run the sample code to ensure there are no issues.
For a detailed installation guide, please refer to SDK Installation Guide.
We support the widely used Llama3, along with other Llama-based models (Solar, EEVE, etc.). Since the performance can be affected depending on the size and version of the models, please check Sample Code of each model.
Rebellions Model Zoo provides code examples for different models that you can optimize and test in our RBLN online documentation. You can find the code examples for each model on the following pages under the Task column:
- HuggingFace
- Pytorch
- Tensorflow
While we run tests covering most conceivable situations before release, bugs still happen.
Send us a file of Debug Dump Binaries (DDBs) so that we can provide technical support. The DDB file is safely encrypted and includes all the compilation steps and error logs. Please refer to Trouble and contact our Technical Support Team or email us.
While we aim to support most SOTA models on Hugging Face, we canโt guarantee that all of them will automatically run on Rebellions NPU.
Some models may require optimizations or improvements.
For any inquiries on models not listed on
Optimum RBLN, please contact our Technical Support Team or email us.
Yes. RBLN SDK provides a runtime interface bounded to C/C++, useful in environments where the Python Runtime is not supported or when applications need to achieve optimal execution time.
To use RBLN SDK in C/C++:
1. Update the APT repository
2. Install rbln sdk package
For more details, please refer to our documentation on C API.
The pipeline for model inference while running models on ATOM generally looks like this:
1. Preparing the pretrained model to run
2. Compiling the model with RBLN Compiler and saving the model
3. Loading the model with RBLN Runtime and running inference
4. Result (API etc)
Please note that the pipeline can differ depending on each service architecture. You can use RBLN SDK to run pre-trained deep learning models.
Yes, RBLN SDK supports Triton Inference Server and OpenAI API,
so that developers can focus on deploying models that meet the needs of their applications.
Yes. RBLN SDK supports configuration of multiple NPUs.
You can either use a single ATOM NPU to execute the model in parallel or divide the workload of a large model across multiple NPUs.
Currently, our Optimum RBLN supports this feature. For a list of supported models, please refer to Optimum RBLN.
Need help finding information?