Name: Krsna SoC · ExSLerate V2 Neural Engine
Brand: SandLogic
Availability: InStock

Question 1

What is the Krsna SoC?

Accepted Answer

Krsna is SandLogic's AI accelerator SoC for the "talk-to-chip" era — real-time conversational AI, on-device LLM inference, and edge-class voice / vision / speech workloads. Krsna is built around the ExSLerate V2 neural engine IP and ships in four configurations: Apex (M4096, robotics + automotive), Surge (M1024, drones + light edge), Pulse (M256, smartwatches + smart speakers), and Lite (M64, always-on wearables). Native INT4 / FP8 precision. First silicon FY26-27.

Question 2

What is Dynamic Neural Compression (DNC)?

Accepted Answer

DNC is a patented hardware engine inside Krsna that compresses weights and KV-cache at line rate, in the data path between DRAM and the compute tiles. Two on-die blocks make it work: the Tensor Codec handles compression/decompression, and the Tensulator handles accumulation. DNC delivers 28% lossless weight compression and 50% memory traffic reduction at peak context length. The compute tiles always see full tensors; the DRAM bus carries compressed data. PCT patent application filed with the Indian Patent Office.

Question 3

How does Krsna fit 128k tokens on an 8 GB endpoint?

Accepted Answer

DNC turns the memory math around. On a standard 8 GB endpoint, Llama 3 8B is OOM at baseline (cannot fit). With Krsna + DNC + FP8 precision, the compressed footprint allows 40k tokens in RAM plus 88k tokens via SSD swap — total 128k token context. For Shakti 2.5B: baseline 45.4k tokens in RAM, with DNC 92k tokens in RAM plus 36k SSD = 128k total. The compression is lossless; accuracy on Llama 3.1 8B FP8 on Krsna stays within rounding of BF16 baseline on A100 across MMLU, GSM8K, HellaSwag, and seven other benchmarks.

Question 4

What is the Infinite Series Engine in Krsna?

Accepted Answer

The Infinite Series Engine is a hardware block inside Krsna's ExSLerate V2 IP that evaluates non-linear activation functions (SiLU, GeLU, Softmax, Sigmoid, Tanh) directly in the datapath using polynomial coefficients fitted at compile time. Most NPUs ship these functions off-die to a CPU SFU, paying area cost and CPU round-trip latency. The Infinite Series Engine eliminates that round-trip — output precision tracks BF16 to within rounding on FP8 silicon, with none of the SFU area cost.

Question 5

Is the Krsna SoC software stack proprietary?

Accepted Answer

No. The Krsna software stack is built on IREE — the open MLIR-based compiler runtime. PyTorch, TensorFlow, and JAX are first-class frontends. Standard MLIR dialects (Linalg, TOSA) compile to .vmfb (Virtual Machine FlatBuffer) artifacts; the HAL handles silicon scheduling. ExSLerate adds three custom passes on top of stock IREE — graph optimization tuned to the chip tile, DNC injection at the right edges, and quantization for INT4 / FP8 native paths. Anything that targets IREE today targets Krsna tomorrow.

Question 6

When is Krsna available?

Accepted Answer

First silicon FY26-27 (April 2026 - March 2027). The ExSLerate V2 IP is licensable now to silicon vendors building their own SoCs. The full Krsna SoC product (four configurations) ships against the FY26-27 silicon timeline. V3 (SOHO server, 27B-class on 24 GB GDDR6) targets 2027; V4 (data-center, A100-class) targets 2028. Contact sales@sandlogic.com for engagement terms and FPGA prototype access.

Model	Standard RAM (baseline)	ExSLerate V2 + DNC	SSD extension	Total max context
Llama 3 · 8B	0 (OOM)	40k tokens	+88k tokens	128k tokens
Shakti · 2.5B	45.4k tokens	92k tokens	+36k tokens	128k tokens
Shakti · 500M	32k tokens	32k tokens	Fits in RAM	32k tokens

Config	MMLU	SST-2	GSM8K	COT	PIQA	HELLA	WINO	BoolQ	Lamb	ARC-C
Llama 3.1 8B A100 · BF16 baseline	65.68%	94.00%	55.00%	82.00%	55.00%	79.00%	78.00%	69.00%	52.00%	69.88%
Llama 3.1 8B Krsna · ExSLerate V2 · FP8 (E4M3)	62.91%	93.00%	44.00%	84.00%	53.00%	78.00%	80.00%	65.00%	51.00%	67.87%

Specification	Standard requirement	SandLogic ExSLerate V3
Required RAM	48 GB GDDR6	24 GB GDDR62× smaller
Memory bus width	384-bit (expensive)	128-bitoptimized
Target application	SOHO / local privacy	SOHO server · local RAG

Krsna SoC.
Made for the Talk-to-Chip era.

One engine. Four configurations.

From wearables to robotics — 64× MAC range, log-scaled.

Two engines. One memory wall, defeated.

Dynamic Neural Compression

Infinite Series Engine

Built for what ships. Four model families, one stack.

How 8 GB of RAM
holds 128K tokens.

The DRAM-to-compute bus is where edge LLMs lose.

Lossless weight compression

Memory traffic reduction at peak context

Context window on an 8 GB endpoint

FP8 accuracy. Within rounding of BF16.

Built on IREE. Open from frontend to silicon.

No vendor lock-in

Broad frontend compatibility

Flexible deployment

ExSLerate extensions

Operator coverage

Supported precision

Where Krsna lives.

Robotics & heavy edge AI.

Drones & light edge AI.

Smartwatch & smart speaker.

Always-on wearables.

The ExSLerate evolution.

ExSLerate V2

ExSLerate V3

ExSLerate V4

Architecture efficiency vs standard requirement.

License Krsna IP. Or run it on the simulator first.

Krsna SoC.Made for the Talk-to-Chip era.