IndQA: OpenAI’s First Cultural Benchmark Begins with Indian Languages

OpenAI has officially launched IndQA, a new multilingual and culture-sensitive benchmark designed to evaluate how effectively AI models can understand and reason through questions grounded in Indian languages and cultural contexts. Released on November 4, 2025, this initiative marks OpenAI’s first major region-specific benchmark, focusing on linguistic diversity, cultural nuances, and contextual intelligence in India — the company’s second-largest user market for ChatGPT.

What Is IndQA?

IndQA stands for Indian Question-Answering benchmark. It currently features 2,278 questions covering 11 Indian languages,

  • Hindi, Hinglish, Gujarati, Punjabi, Kannada, Odia, Marathi, Malayalam, Tamil, Bengali, and Telugu

The benchmark spans 10 cultural domains,

  • Law and Ethics
  • Architecture and Design
  • Food and Cuisine
  • Everyday Life
  • Religion and Spirituality
  • Sports and Recreation
  • Literature and Linguistics
  • Media and Entertainment
  • Arts and Culture
  • History

It was developed with the input of 261 domain experts, including scholars, journalists, linguists, artists, and subject specialists.

How Does IndQA Work?

  • The evaluation process is built around a rubric-based grading system, where each AI-generated response is scored against predefined criteria crafted by experts for each question.
  • Each criterion is assigned weighted points based on its relevance and importance.
  • A model-based grader checks responses against these criteria, and the final score is calculated accordingly.
  • All questions were tested with OpenAI’s most powerful models, including GPT-4o, GPT-4.5, GPT-5, and OpenAI o3 during creation to ensure adversarial robustness.

Benchmark Performance: AI Models Compared

Initial benchmarking results based on IndQA showed significant variance among leading models,

  • GPT-5 (Thinking High): 34.9% (Highest overall)
  • Gemini 2.5 Pro Thinking: 34.3%
  • Gemini 2.5 Flash Thinking: 29.7%
  • Grok 4: 28.5%
  • OpenAI o3 High: 28.1%
  • GPT-4o: 20.3%
  • GPT-4 Turbo: 12.1%

Language-wise observations

  • Highest performance was seen in Hindi and Hinglish, where GPT-5 scored around 45% and 44% respectively.
  • Lowest performance was observed in Bengali and Telugu, revealing gaps in existing AI language models for these scripts.
  • OpenAI clarified that IndQA is not a cross-language leaderboard, since the questions differ across languages. Instead, it serves as a within-model benchmark to measure progress over time.
Shivam

Recent Posts

Which Bird is known as the Harbinger of Spring?

Spring is a beautiful season that brings new life, fresh colours, and pleasant weather after…

23 mins ago

Renowned Ecologist Madhav Gadgil Passes Away at 82

Madhav Gadgil, one of India's most renowned and influential ecologists, passed away at his residence…

2 hours ago

Federal Bank Unveils ‘The Fortuna Wave’: Strategic Brand Refresh Signaling Modernization and Digital Future

On January 7, 2026, Federal Bank Limited, a prominent Indian private sector bank, unveiled its…

2 hours ago

Which Mountain is known as the Sacred Mountain of India?

A special mountain in India is highly respected for its deep spiritual meaning and ancient…

2 hours ago

DRDO Celebrates 68th Foundation Day 2026: Aatmanirbhar Bharat, Cyber, Space & AI Leadership

On January 1, 2026, the Defence Research and Development Organisation (DRDO), functioning under the Ministry…

2 hours ago

Earth’s Rotation Day 2026 – January 8: Celebrating Léon Foucault’s Historic Proof of Earth’s Motion

Earth's Rotation Day is observed globally on January 8 each year to highlight the profound…

3 hours ago