India’s AI ambitions are finally seeing homegrown breakthroughs, and Bengaluru-based startup Sarvam AI is leading the charge.
With the launch of its advanced OCR tool Sarvam Vision and AI voice engine Bulbul V3, the company is challenging global AI giants like ChatGPT and Google Gemini, signaling that India is not just a consumer of AI technology but a builder of world-class AI solutions.
Sarvam AI: India’s Sovereign AI Initiative
Despite India’s deep pool of tech talent and scale, the country has rarely been recognised as a source of core AI innovation.
Sarvam AI is changing that narrative by creating foundational AI models from scratch in India, describing its approach as a “sovereign AI.”
sarvam ai 🇮🇳
founded by two highly qualified individuals from india.
outperforming gemini in ocr is a big deal. it feels so good to see an indian vlm outperforming the global giants.
also used sarvam-m briefly for the first time yesterday. the ui is very impressive. read… https://t.co/Q9RjoWXFw0 pic.twitter.com/QzkSpPqo4N
— Ryuzaki (@ryuzaki2401) February 7, 2026
The timing of its latest announcements, just ahead of the India AI Impact Summit 2026, is deliberate. By unveiling a trio of models spanning vision, speech recognition, and text-to-speech, Sarvam AI is sending a clear message: India’s AI ecosystem is serious about earning its place on the global stage.
Sarvam Vision: Redefining OCR for Indian Languages
At the core of Sarvam AI’s breakthroughs is Sarvam Vision, a 3-billion-parameter vision-language model tailored for multilingual document intelligence.
Unlike generic AI models, Sarvam Vision is specifically designed to handle the complexity of Indian paperwork, public-facing digital infrastructure, and a variety of scripts and languages.
The model excels at reading scanned documents, charts, technical tables, and even mathematical formulas, areas where traditional OCR systems often struggle.
Mark my words, India is going to hear differently from Today.
I’ve been using @SarvamAI for the last couple of hours.
Oh man, It’s truly amazing.It can understand and talk in Telugu/Hindi/English very clean
Now Indian Audience don’t need to talk to a Therapist anymore
1/2
— Paluvadi Surya (@paluvadisurya) February 7, 2026
According to the company, Sarvam Vision achieved 84.3 percent accuracy on the olmOCR-Bench, outperforming models such as Gemini 3 Pro and DeepSeek OCR v2, while ChatGPT lagged behind. On OmniDocBench v1.5, which tests how AI systems read and understand real-world documents, Sarvam Vision scored 93.28 percent, demonstrating exceptional skill in parsing complex layouts and dense content.
The company has even made its APIs free for developers through February 2026, showing confidence in its performance and commitment to wider adoption.
Bulbul V3: Taking AI Voice to the Next Level
Sarvam AI’s other breakthrough, Bulbul V3, is a text-to-speech engine built for Indian languages, with support for over a dozen languages and plans to expand to 22. Unlike demo-focused AI voices, Bulbul V3 is designed for production-grade speech, delivering natural, expressive, and accurate voice generation across regional accents and scripts.
The model currently offers over 35 voices across 11 Indian languages and has been praised by industry leaders like Pratik Desai of KissanAI, who described it as their go-to AI voice solution for Indic use cases.
By combining accuracy, stability, and naturalness, Bulbul V3 rivals global solutions such as ElevenLabs, but with a clear focus on India-specific applications, filling a gap that international AI providers have largely ignored.
In a blind study conducted by @JoshTalksLive, listeners compared Bulbul V3, ElevenLabs (v3 alpha and v2.5 flash), and Cartesia Sonic-3 with over 20,000 votes. Bulbul V3 tops the scores for 8kHz audio, setting a new benchmark for speech synthesis for voice agents. pic.twitter.com/oof1C5YJ0b
— Pratyush Kumar (@pratykumar) February 7, 2026
Outperforming Global Giants
The global AI community is taking notice. Initially questioned for focusing on Indic-language models, Sarvam AI is now receiving widespread recognition for its work.
Tech commentator Deedy Das admitted that he had underestimated the startup, noting that its OCR and speech models for Indian languages are strong, affordable, and highly valuable. Users have also shared positive experiences, with one describing Sarvam AI’s tools as “wow” after trying them firsthand.
Driving the Dream of Atmanirbhar Bharat
With Sarvam Vision and Bulbul V3, India is moving closer to realizing the vision of Atmanirbhar Bharat in technology. These tools demonstrate that homegrown AI can not only compete with but also outperform global leaders in key areas such as OCR and voice synthesis. By focusing on India-specific challenges, Sarvam AI is creating a sovereign AI stack that strengthens the country’s technological independence while setting a new benchmark for Indic-language AI.
India’s AI journey is no longer just about adoption; with Sarvam AI, it’s about innovation, leadership, and self-reliance on the global stage.
Sofia Babu Chacko is a journalist with over five years of experience covering Indian politics, crime, human rights, gender issues, and stories about marginalized communities. She believes that every voice matters, and journalism has a vital role to play in amplifying those voices. Sofia is committed to creating impact and shedding light on stories that truly matter. Beyond her work in the newsroom, she is also a music enthusiast who enjoys singing.