// APPLICATION — SRUTHI

Sruthi.
ASR engine
for production speech.

Two ASR engines, one API. Sruthi-T — a transformer model that's best-in-class on English long-form (13.27% WER, beating Deepgram, Google, Azure, ElevenLabs). Sruthi-S — a Samba state-space model with 3.65% average WER across LibriSpeech, GigaSpeech, and SPGISpeech, beating Whisper-large-v3 (7.44%) and CrisperWhisper (4.69%).

English long-form WER
13.27%
Samba avg WER
3.65%
vs Whisper-large-v3
−51%
Engines available
2

Pick the model that fits the workload.

Same SDK, same streaming protocol, same telephony adapters — two architectures behind it. Choose the transformer for multilingual production today; reach for Samba-ASR when the workload is English-heavy and accuracy is the only thing that matters.

[ ENGINE 01 — SRUTHI-T ]

Transformer · multilingual

Best-in-class on English YouTube long-form. Production-ready in English and Hindi today.

  • · 13.27% WER on English long-form (best of 6 systems tested)
  • · 16.50% WER on Hindi long-form (2nd of 6, behind Deepgram nova-2)
  • · Streaming + batch · code-switching native
  • · The default engine in Lingo and IRA deployments
[ ENGINE 02 — SRUTHI-S ]

Samba · state-space

Mamba-based encoder-decoder. Linear-complexity attention replacement. English research SOTA, multilingual on roadmap.

  • · 3.65% average WER (LibriSpeech, GigaSpeech, SPGISpeech)
  • · 1.17% on LibriSpeech clean · 1.84% on SPGISpeech
  • · Beats Whisper-large-v3 (7.44%) and CrisperWhisper (4.69%)
  • · arXiv 2501.02832 — Shakhadri, Kruthika, Angadi (2025)

Best in class on English long-form.

345 audio samples (~10 hours) drawn from publicly available YouTube videos. Six commercial ASR systems compared head-to-head on word-error rate and character-error rate.

SandLogic STT13.27%Sarvam Saaras v313.55%Deepgram nova-317.53%Microsoft Azure21.93%ElevenLabs Scribe v223.19%Google Chirp 324.47%
↓ Lower is better.English long-form WER · 118 samples · YouTube-derived audio.
Full data table with CER →
ModelWER %CER %
SandLogic STT13.2711.36
Sarvam Saaras v313.5513.41
Deepgram nova-317.5310.16
Microsoft Azure21.938.45
ElevenLabs Scribe v223.1910.16
Google Chirp 324.4712.24

Source: llms.sandlogic.com/asr-benchmarks · 118 English samples · WER and CER measured on identical reference transcripts.

Top-two on Hindi long-form.

Same evaluation methodology — 227 Hindi samples drawn from publicly available YouTube videos. Sruthi-T finishes second, with the lowest WER among systems that also cover English natively.

Deepgram nova-213.80%SandLogic STT16.50%Sarvam Saaras v317.52%ElevenLabs Scribe v219.99%Microsoft Azure29.35%Google Chirp 329.55%
↓ Lower is better.Hindi long-form WER · 227 samples · YouTube-derived audio.
Full data table with CER →
ModelWER %CER %
Deepgram nova-213.807.75
SandLogic STT16.5010.95
Sarvam Saaras v317.5213.51
ElevenLabs Scribe v219.9910.22
Microsoft Azure29.3512.29
Google Chirp 329.5510.83

State-space attention beats transformers on average WER.

Sruthi-S is built on the Samba-ASR architecture — a Mamba-based encoder-decoder that swaps quadratic self-attention for selective state-space recurrence. The result: linear computational complexity and lower average WER than the leading transformer ASR systems on standard English benchmarks.

// AVERAGE WER ACROSS LIBRISPEECH · GIGASPEECH · SPGI

Mamba beats transformers on average WER.

Samba-ASR (Sruthi-S)Mamba state-space3.65%Nvidia Canary-1bTransformer4.15%CrisperWhisperTransformer4.69%Whisper-large-v3Transformer (baseline)7.44%
↓ Lower is better.Lower is better. Average WER across LibriSpeech (clean+other), GigaSpeech, SPGISpeech.
Per-test-set breakdown →
Test setSamba WER %Whisper-l-v3Canary-1bNote
LibriSpeech clean1.17frontier
LibriSpeech other2.48frontier
GigaSpeech9.12
SPGISpeech1.84financial domain
Average WER3.657.444.15

Source: arXiv 2501.02832 — SAMBA-ASR: State-of-the-Art Speech Recognition Leveraging Structured State-Space Models · Shakhadri, Kruthika, Angadi (SandLogic, 2025) · Trained on LibriSpeech (460h), GigaSpeech (10,000h), SPGISpeech (5,000h).

Six things production ASR needs.

Streaming transcription

Sub-300ms first-token latency. Partial hypotheses surface as the speaker talks — built for live agent assist, IVRs, and real-time captions.

Code-switching native

Handles mid-sentence Hindi-English shifts without resetting the decoder. Trained on real Indian call-center audio, not dubbed corpora.

Long-form robustness

Best-in-class WER on noisy YouTube long-form English (13.27%) — beats Deepgram nova-2, Google Chirp 3, Microsoft Azure, ElevenLabs Scribe v2.

Diarization & overlap

Speaker separation, turn-taking labels, and overlap detection out of the box. No second-pass models, no external diarizer.

Voice biometrics

Speaker verification scoring on the same audio frame as transcription — useful for fraud detection and authentication flows.

On-prem & edge

Same binary runs on Krsna SoC, NVIDIA, AMD, Intel, ARM. Air-gapped deployment supported. No cloud round-trip.

// LANGUAGE STATUS

Production-ready today. Multilingual on roadmap.

TIER 01Benchmarked productionEnglish · Hindi — head-to-head benchmarks against six commercial systems2 LANGSTIER 02Production rolloutTamil · Telugu · Marathi · Bengali · Kannada · Malayalam · Punjabi · Gujarati8 LANGSTIER 03RoadmapRemaining 14 Indic + 40 foreign. Samba-ASR multilingual extension is research-track.54 LANGS
Coverage broadens at each tier. Tier 01 is the published benchmark surface; Tier 02 is customer-deployed without published WER; Tier 03 is product roadmap.
[ TIER 01 ]

Benchmarked production

English · Hindi — head-to-head benchmarks against six commercial systems on long-form audio.

[ TIER 02 ]

Production rollout

Tamil · Telugu · Marathi · Bengali · Kannada · Malayalam · Punjabi · Gujarati — deployed in customer pilots, public benchmarks coming.

[ TIER 03 ]

Roadmap

Remaining 14 Indic languages and 40 foreign languages are on the multilingual roadmap. Samba-ASR multilingual extension is research-track.

Hear it on your audio.

Send us a sample call from your stack and we'll return a transcript, diarization, and a head-to-head WER comparison against your incumbent within 48 hours.

[ DEMO REQUEST ]
Send a 30-second call. Get a transcript back.

No NDA needed for the first sample. We benchmark against your incumbent and report numbers — even if they're not in our favor.

Email us a sample

Drop-in replacement for your existing ASR.

[ INTEGRATION 01 ]

REST + WebSocket

OpenAI-compatible HTTP for batch jobs. Persistent WebSocket for streaming with partial hypotheses and end-of-utterance signals.

[ INTEGRATION 02 ]

Telephony adapters

Native connectors for Asterisk, FreeSWITCH, Twilio, Genesys, and Avaya. SIPREC tap supported for call-recording inspection.

[ INTEGRATION 03 ]

Lingo & IRA bundled

Sruthi is the speech layer underneath Lingo and IRA. If you deploy either, the engine ships with them — no separate procurement.

// LET'S BUILD

Pick the audio. Get the transcript.