India’s Homegrown ‘Sarvam AI’ Outperforms Google Gemini and ChatGPT in Key AI Benchmarks

The Bengaluru-based startup Sarvam AI has put India on the global map with its two tools, Sarvam Vision, an advanced document-reading OCR system, and Bulbul V3 voice model

Pratyush Kumar's homegrown Indian ‘Sarvam AI’ beats Google Gemini and ChatGPT

 

In a first, India’s Artificial Intelligence (AI) company ‘Sarvam‘ has managed to beat global giants like Google Gemini and ChatGPT in key AI tasks, challenging the long-held belief that serious AI innovation only comes from the US or China.

As per reports, the Bengaluru-based startup Sarvam AI has launched two tools, Sarvam Vision, an advanced document-reading (OCR) system, and Bulbul V3, a powerful AI voice generation model.

India has long been recognised for its tech talent and IT services, but not for building foundational AI models, a space historically dominated by the US and China. Hence, it is the first time that an Indian AI company appears to have built world-class models, especially for India-specific needs such as local languages, documents, and speech.

According to the company, Sarvam Vision has outperformed popular AI models like ChatGPT, Google Gemini, and Anthropic Claude on certain OCR benchmarks, especially those involving Indian languages. The tool has impressed both users and experts for its ability to accurately read complex real-world documents.

Sarvam AI co-founder Pratyush Kumar recently shared details of the latest achievements from the company’s in-house AI models in a series of posts on X.

The company says Sarvam Vision scored 84.3 percent accuracy on the olmOCR-Bench, beating Gemini 3 Pro and newer OCR systems like DeepSeek OCR v2, while ChatGPT ranked much lower.

The model also performed strongly on OmniDocBench v1.5, a benchmark that measures how well AI understands real documents. Sarvam Vision scored 93.28 percent overall, with particularly strong results in handling complex layouts, technical tables, and mathematical formulas, areas where traditional OCR systems usually struggle.

This performance has drawn global attention and Sarvam AI, which was earlier questioned for focusing heavily on Indic-language models, is now seeing that scepticism turn into approval.

Alongside Sarvam Vision, the company has also launched Bulbul V3, its latest text-to-speech AI model. Bulbul V3 is designed to generate natural-sounding voices for Indian languages and competes with global voice AI platforms like ElevenLabs, which is considered a leader in this space. At present, Bulbul V3 supports more than 35 voices across 11 Indian languages, with plans to expand support to 22 languages.

What is impressive about Sarvam Vision?

Sarvam Vision shows that you don’t need the biggest or most expensive AI to win. By focusing deeply on local needs and building the tech well, Sarvam has managed to outperform much larger global models. If the company continues on this path, it won’t just be seen as India’s AI success story, it could become a global example of how specialised AI can beat size and scale.

The other impressive thing about the homegrown AI is it excels at Indian languages, not just English. Most OCR systems struggle with Indic scripts because of varied fonts, ligatures, and inconsistent formatting. Sarvam Vision is built specifically for these challenges, which is why it performs better on Indian-language documents than global models trained primarily on English data.

Sarvam Vision doesn’t just read clean PDFs. It handles messy layouts, tables, mathematical formulas, and technical documents, areas where traditional OCR tools often break down. It beats larger models on specialised benchmarks and outperforms or matches much larger and more talked-about models like Gemini and ChatGPT in OCR-specific tasks, a big deal for a focused, homegrown model.

The Indian AI is clearly designed for real applications, government records, legal files, financial documents, and enterprise workflows common in India. Its accuracy on dense, poorly formatted documents makes it immediately useful, not just impressive in demos.

Exit mobile version