Rebellions SDK
Built for developers who scale AI
High Utilization, Low Power, Proven at Rack Scale.
Built to serve frontier LLMs with high utilization and low power
Powered by unified mixed precision cores, predictive DMA, and UCIe interconnect
Rack-scale performance. Modular flexibility. Ready for deployment
REBEL-Quad executes FP8 and FP16 in a single, mixed-precision pipeline—no need for separate blocks or recompiled kernels. This delivers 2.8x higher compute density and 16% higher utilization vs. ATOM™.
REBEL-Quad uses a predictive, software-controlled DMA engine tightly coupled with an on-chip mesh to prefetch KV data proactively. This enbales 2.7TB/s effective bandwidth and reduces token-level latency in 32K+ context LLMs.
REBEL-Quad extends a full-chip mesh over UCIe-Advanced interconnects, offering 1TB/s per channel bi-directional bandwidth with just 11ns latency. Chiplets operate as one virtual die—no software changes, no I/O bottlenecks.
REBEL-Quad implements hardware-accelerated, full-mesh synchronization across 256 routers. This avoids stalls in sparse or imbalanced workloads, sustaining high utilization across all chiplets and model phases.