India Microprocessor Challenge.
ExSLerate V1 ranked #1 of 30 finalists in MeitY's India Microprocessor Challenge. Foundational silicon recognition that seeded the IP family. More →
ExSLerate is a system-focused NPU IP — validated on FPGA and available for licensing today. Built around one thesis: for inference at the edge, the memory wall is the only wall that matters. Proprietary, patented hardware-software co-design cuts DRAM traffic by up to 50%, lossless at 8-bit precision. Supports inference of computer-vision, language, and speech models. Krsna is the planned prototype AI SoC built around the IP.
ExSLerate runs today on AMD Xilinx ZCU106 and Kria KR260 SOM. The full software stack — IREE compiler, runtime, and host driver — executes on the on-board ARM Cortex application processor with the NPU IP in programmable logic. The system runs end-to-end inference of the supported model families.
ExSLerate did not arrive in a vacuum. Four institutional milestones — across India's flagship semiconductor programs and one of the chip industry's defining names — mark the path that brought the IP to where it is today.
ExSLerate V1 ranked #1 of 30 finalists in MeitY's India Microprocessor Challenge. Foundational silicon recognition that seeded the IP family. More →
Aegis Graham Bell Award for the chip program. Selected into MeitY C2S — 1 of 13 companies in India's flagship semiconductor program. More →
Selected into Qualcomm QSMP as 1 of 2 cohort companies. Industry-partner validation from the chip leader. More →
Co-development partnership with Brandworks Technologies announced. First wave of co-developed AI hardware planned for 2026. More →
ExSLerate NPU IP ships in four configurations, supported by the modular and scalable engines for compute tiles, scheduler and data pipeline, and the compiler software toolchain. The configurations provide variants with different MAC counts and on-die memory budget, sized for different thermal and product envelopes. License the configuration that fits your design.
Real-time conversational AI for robotics and heavy edge applications. STT, TTT, and TTS in one inference pipeline. Sized for service robots, automotive HMIs, and industrial control surfaces where latency is the contract.
Light edge AI for drones and platforms where every gram and milliwatt counts. Object detection, classification, and on-board SLMs in the same envelope. The variant that goes where a fan cannot.
Tuned for the audio-and-display class of consumer devices. Smartwatches with on-device NLU, smart speakers, and any product where the model is a feature shipping in the BOM, not a fallback to the cloud.
The lowest-power inference target in the family. Built for wearables and hearables where the model never sleeps because the battery cannot afford the wake-up cost. Always-on is the feature.
Configurable bus widths per IP configuration. Drops into a standard AXI fabric.
A reference integration view of ExSLerate inside a customer SoC, with the SandLogic software stack riding on top. From production-deployed foundation models, through the IREE open compiler and runtime, down to the silicon blocks and the AXI fabric that ties them together.
Three layers, one stack. Foundation models on top — SandLogic's own production-deployed Shakti, Sruthi and Svara alongside the open model ecosystem. The IREE compiler and runtime in the middle — open and MLIR-based, with the ExSLerate compiler extensions plugged in. And the silicon below: ExSLerate as the NPU accelerator inside a reference SoC, with tightly-coupled SRAM, standard AXI4 to the rest of the system, and DRAM off-chip.
ExSLerate ships as a complete IP package: the RTL you integrate, the software stack that drives it, the FPGA bitstream you can stand it up on, and the verification environment we sign it off against ourselves.
The IP ships as IEEE 1735 encrypted Verilog, ready for standard simulator and synthesis flows — Cadence Xcelium, Synopsys VCS, AMD Vivado. What you get is the configuration you license: the M4096 RTL is a different deliverable from the M64 RTL, sized accordingly.
Everything the IP needs to actually run a model on your SoC. The IREE compiler with our extensions, the runtime, the HAL drivers, and frontends for PyTorch, JAX, and TensorFlow. We use it ourselves in the FPGA flow — so what we ship is what we run.
A working bitstream for the validated ZCU106 and Kria target — so you can stand the IP up against your own models on day one, rather than spending a quarter integrating before you see anything inference.
UVM testbench, the regression suite we use internally, and the scripts that wire it together. It is the same environment the IP signs off against on our end — not a stripped-down version we hand over.
ExSLerate NPU IP achieves two critical outcomes that determine the performance of a modern model: how much of the model can fit in the available memory, and how much of the math for the expensive activations is cleanly executed. These outcomes are achieved through proprietary, patented hardware-software co-design.
Up to 50% less DRAM traffic
The dominant cost in edge LLM inference is moving tensors across the memory bus. ExSLerate cuts that traffic by up to 50% at peak context, lossless at 8-bit precision, through proprietary patented co-design — ensuring that the model inferred on chip is the same as provided by the compiler toolchain. The benefits include longer context, lower power, or both.
* Comparison is against an 8-bit baseline without proprietary algorithms.
Inline non-linear activation
The non-linear functions — the GeLUs, SiLUs, and Softmaxes — execute inline on the ExSLerate datapath. No area cost on die for special-function units. No increase in latency for offload round-trips to a host CPU. FP8 compute units; output precision tracks BF16 on silicon within rounding, across speech, vision, and language workloads.
ExSLerate covers the four families of model that show up in real products today. Language, speech, vision, and state-space models are supported with the same compiler and runtime stack all the way down to the silicon.
Performance data is shared separately. Throughput, latency, power, and per-configuration benchmarks are released under NDA on an engagement basis. For the full performance dossier and FPGA prototype numbers, write to sales@sandlogic.com.
LLM serving fails on edge hardware, primarily because weights and KV-cache do not fit in the system-provided memory. System-focused development of ExSLerate turns that math around: less data crosses the memory bus, more model content fits in the RAM, and compute operates on full tensors.
Same compute, half the bus. Standard NPU on the left, ExSLerate IP on the right — drawn the same way so the differentiation is the IP as a whole. Up to 50% less data crosses the DRAM bus, lossless at 8-bit precision.
* Comparison is against an 8-bit baseline without proprietary algorithms.
Up to 50% less data crosses the bus at peak context, lossless at 8-bit precision.
Llama 3 8B with RAM + SSD swap, where the baseline runs out of memory immediately. More model in the same memory budget.
| Model | Standard RAM (baseline) | ExSLerate IP (in RAM) | SSD extension | Total max context |
|---|---|---|---|---|
| Llama 3 · 8B | 0 (OOM) | 40k tokens | +88k tokens | 128k tokens |
| Shakti · 2.5B | 45.4k tokens | 92k tokens | +36k tokens | 128k tokens |
| Shakti · 500M | 32k tokens | 32k tokens | Fits in RAM | 32k tokens |
BF16 baseline on NVIDIA A100 versus FP8 (E4M3) on ExSLerate IP. The table below shows the delta — what they look like in reality.
| Config | MMLU | SST-2 | GSM8K | COT | PIQA | HELLA | WINO | BoolQ | Lamb | ARC-C |
|---|---|---|---|---|---|---|---|---|---|---|
Llama 3.1 8B A100 · BF16 baseline | 65.68% | 94.00% | 55.00% | 82.00% | 55.00% | 79.00% | 78.00% | 69.00% | 52.00% | 69.88% |
Llama 3.1 8B Krsna · ExSLerate V2 · FP8 (E4M3) | 62.91% | 93.00% | 44.00% | 84.00% | 53.00% | 78.00% | 80.00% | 65.00% | 51.00% | 67.87% |
Hardware is half the product. The ExSLerate SDK is founded on IREE, the open MLIR-based compiler runtime. Standard dialects in, .vmfb out. No proprietary frontend, no vendor lock-in, no rewrite of your model.
Models enter the toolchain through standard MLIR dialects, Linalg and TOSA. Anything that targets IREE today will target ExSLerate tomorrow. Your existing toolchain stays put.
PyTorch, TensorFlow, and JAX are first-class. The frontend you ship in is the frontend you stay in. No re-export, no rewrite, no parallel model branch.
IREE decouples the model graph from the hardware executable. Update one without rebuilding the other. The HAL handles scheduling and runtime; the FlatBuffer carries the deployable.
Three custom passes ride on top of stock IREE: graph optimization tuned to the IP, proprietary encoding passes injected at the right edges, and quantization for INT4 and FP8 native paths.
Native datapath formats
Four market segments and use cases, four product envelopes. License the configuration that fits your design.
ExSLerate is FPGA-validated and shipping as licensable IP today. The next phase is silicon. Krsna is the prototype SoC we are building around the IP — the reference integration that demonstrates the full stack on a single die.
IP available
FPGA prototype validated. Available for licensing across the four configurations. Customer engagements active.
Krsna design
Krsna prototype SoC under design. Demonstrates the full ExSLerate stack on a single die, end to end.
First silicon
Krsna goes to silicon. First samples back, brought up against the full ExSLerate compiler and runtime stack.
Customer SoCs
Customer-defined SoCs built around licensed ExSLerate IP, in parallel. The IP is the product; Krsna is the proof.
ExSLerate Gen 1 is the IP available today, targeted at endpoint and robotics-class designs. Future generations push the same IP family into SOHO server and data-center envelopes — advanced architectural enhancements for inter-die and intra-die compute clusters, bigger memory and bandwidth.
Endpoint & robotics
Run 8B-class models on edge devices. Up to 50% DRAM traffic reduction at 8-bit precision. Four IP configurations from M64 to M4096.
SOHO server
Targets local 27B-class inference for enterprise RAG. Engineered to land on cost-effective hardware (24 GB GDDR6, 128-bit bus) instead of the 48 GB / 384-bit alternative.
Data center
A100-class throughput envelope. Built for full-rack deployment in sovereign and private clouds.
| Specification | Standard requirement | ExSLerate Gen 2 |
|---|---|---|
| Required RAM | 48 GB GDDR6 | 24 GB GDDR62× smaller |
| Memory bus width | 384-bit (expensive) | 128-bitoptimized |
| Target application | SOHO / local privacy | SOHO server · local RAG |
Notes. ExSLerate is FPGA-validated NPU IP available for licensing. Krsna is the planned prototype SoC built around the IP. Detailed throughput, latency, power, and per-configuration benchmarks are released under NDA on an engagement basis. For the full performance dossier, contact sales@sandlogic.com.