Power AI Inference. Efficiently. At Scale.

Rebellions combines advanced chiplet architecture with ultra-high bandwidth HBM3E - delivering unmatched energy efficiency, scalability, and deployability across real-world inference workloads.
Only company with both chiplet and HBM3E in production.

Rebellions

Efficient.

Scalable.

Deployment-Ready.

REBEL-Quad

Peta-Scale MoE Inference.
Without the Energy Tax.

Performance

One Engine.
Mixed Precision.

Energy Efficiency

Smarter Prefetch.
Faster Execution.

Scalability

Modular Architecture.
Monolithic Efficiency.

Synchronization

Always On.
Always Through.

REBEL-Quad vs. H200

REBEL-Quad
H200

Throughput
(TPS)

1.2

Efficiency
(TPS/Watt)

2.4

Power Consumption
(Watt)

0.5 (~50% Lower Power Consumption)
Benchmark Condition
Performance measured on Llama 3.3 70B (TP2, FP8) with runtime input/output length 2048/2048.

MoE in Action
Open-Source Frameworks.
Real Deployment.

Architecture for any model. Silicon for any scale.

Built on PyTorch and vLLM. Runs in real time with dynamic expert routing.

HW-SW co-optimized for disaggregated inference.

Develop with familiar tools. Scale with system-level efficiency.

Rebellions Chiplet Ecosystem

Beyond the Die.
Seamless Dataflow and Compute Scalability across Chiplets.

Rebellions Chiplet Design Strategy

UCIe

Rebellions SDK
Deploy with Confidence from Day One.

Purpose-built for PyTorch.
Tuned for production.

High-QPS vLLM serving.
Ready out of the box.

Full Triton access with dev tools
you’ll actually use.

One-click deployment.
Zero guesswork.