IndQA: OpenAI’s First Cultural Benchmark Begins with Indian Languages

OpenAI has officially launched IndQA, a new multilingual and culture-sensitive benchmark designed to evaluate how effectively AI models can understand and reason through questions grounded in Indian languages and cultural contexts. Released on November 4, 2025, this initiative marks OpenAI’s first major region-specific benchmark, focusing on linguistic diversity, cultural nuances, and contextual intelligence in India — the company’s second-largest user market for ChatGPT.

What Is IndQA?

IndQA stands for Indian Question-Answering benchmark. It currently features 2,278 questions covering 11 Indian languages,

  • Hindi, Hinglish, Gujarati, Punjabi, Kannada, Odia, Marathi, Malayalam, Tamil, Bengali, and Telugu

The benchmark spans 10 cultural domains,

  • Law and Ethics
  • Architecture and Design
  • Food and Cuisine
  • Everyday Life
  • Religion and Spirituality
  • Sports and Recreation
  • Literature and Linguistics
  • Media and Entertainment
  • Arts and Culture
  • History

It was developed with the input of 261 domain experts, including scholars, journalists, linguists, artists, and subject specialists.

How Does IndQA Work?

  • The evaluation process is built around a rubric-based grading system, where each AI-generated response is scored against predefined criteria crafted by experts for each question.
  • Each criterion is assigned weighted points based on its relevance and importance.
  • A model-based grader checks responses against these criteria, and the final score is calculated accordingly.
  • All questions were tested with OpenAI’s most powerful models, including GPT-4o, GPT-4.5, GPT-5, and OpenAI o3 during creation to ensure adversarial robustness.

Benchmark Performance: AI Models Compared

Initial benchmarking results based on IndQA showed significant variance among leading models,

  • GPT-5 (Thinking High): 34.9% (Highest overall)
  • Gemini 2.5 Pro Thinking: 34.3%
  • Gemini 2.5 Flash Thinking: 29.7%
  • Grok 4: 28.5%
  • OpenAI o3 High: 28.1%
  • GPT-4o: 20.3%
  • GPT-4 Turbo: 12.1%

Language-wise observations

  • Highest performance was seen in Hindi and Hinglish, where GPT-5 scored around 45% and 44% respectively.
  • Lowest performance was observed in Bengali and Telugu, revealing gaps in existing AI language models for these scripts.
  • OpenAI clarified that IndQA is not a cross-language leaderboard, since the questions differ across languages. Instead, it serves as a within-model benchmark to measure progress over time.
Shivam

Recent Posts

What Changes from April 1 Under India’s Solid Waste Management Rules 2026?

India has taken an important step towards sustainable urban living by updating its solid waste…

57 mins ago

What is the Full Form of UGC? Know About Its Overview and Historical Background

The University Grants Commission, commonly known as UGC, is an important organization in India’s higher…

1 hour ago

What Is India’s New Aadhaar App and How Will It Enable Frictionless Governance?

India has taken another step towards frictionless digital governance with the launch of a new…

1 hour ago

Economic Survey 2025–26: Major Highlights, Growth Outlook and Key Findings

The Economic Survey 2025–26 is an important annual document that shows the real picture of…

1 hour ago

What Does Aadhaar Integration with PATHIK Mean for Law Enforcement?

India’s journey towards smart policing has achieved a major milestone with a pioneering digital initiative…

2 hours ago

Who is the Richest Man in India in 2026? Check the List of Top-10 Wealthiest People of India

India has produced many successful business leaders, but one name stands above all in terms…

3 hours ago