IndQA: OpenAI’s First Cultural Benchmark Begins with Indian Languages

OpenAI has officially launched IndQA, a new multilingual and culture-sensitive benchmark designed to evaluate how effectively AI models can understand and reason through questions grounded in Indian languages and cultural contexts. Released on November 4, 2025, this initiative marks OpenAI’s first major region-specific benchmark, focusing on linguistic diversity, cultural nuances, and contextual intelligence in India — the company’s second-largest user market for ChatGPT.

What Is IndQA?

IndQA stands for Indian Question-Answering benchmark. It currently features 2,278 questions covering 11 Indian languages,

  • Hindi, Hinglish, Gujarati, Punjabi, Kannada, Odia, Marathi, Malayalam, Tamil, Bengali, and Telugu

The benchmark spans 10 cultural domains,

  • Law and Ethics
  • Architecture and Design
  • Food and Cuisine
  • Everyday Life
  • Religion and Spirituality
  • Sports and Recreation
  • Literature and Linguistics
  • Media and Entertainment
  • Arts and Culture
  • History

It was developed with the input of 261 domain experts, including scholars, journalists, linguists, artists, and subject specialists.

How Does IndQA Work?

  • The evaluation process is built around a rubric-based grading system, where each AI-generated response is scored against predefined criteria crafted by experts for each question.
  • Each criterion is assigned weighted points based on its relevance and importance.
  • A model-based grader checks responses against these criteria, and the final score is calculated accordingly.
  • All questions were tested with OpenAI’s most powerful models, including GPT-4o, GPT-4.5, GPT-5, and OpenAI o3 during creation to ensure adversarial robustness.

Benchmark Performance: AI Models Compared

Initial benchmarking results based on IndQA showed significant variance among leading models,

  • GPT-5 (Thinking High): 34.9% (Highest overall)
  • Gemini 2.5 Pro Thinking: 34.3%
  • Gemini 2.5 Flash Thinking: 29.7%
  • Grok 4: 28.5%
  • OpenAI o3 High: 28.1%
  • GPT-4o: 20.3%
  • GPT-4 Turbo: 12.1%

Language-wise observations

  • Highest performance was seen in Hindi and Hinglish, where GPT-5 scored around 45% and 44% respectively.
  • Lowest performance was observed in Bengali and Telugu, revealing gaps in existing AI language models for these scripts.
  • OpenAI clarified that IndQA is not a cross-language leaderboard, since the questions differ across languages. Instead, it serves as a within-model benchmark to measure progress over time.
Shivam

Recent Posts

Which is the Coldest Place on the Earth? Check the Name and Significance

The Earth has many amazing and unusual places, and some of them experience temperatures that…

16 seconds ago

Billionaire Wealth Transfer Enters Historic Phase

In a striking reflection of a shifting global wealth landscape, the UBS Billionaire Ambitions Report…

5 mins ago

Sunil Narine Makes T20 History With 600‑Wicket Milestone

In a landmark moment for cricket, Sunil Narine has become the first player in the…

5 mins ago

S-500 Missile System: Features, Range, Speed, Comparison and India’s Interest

Russia’s S-500 Missile System, officially known as 55R6M “Triumfator-M” or Prometey, is shaping the future…

30 mins ago

RELOS Agreement and India–Russia Relations: Objectives, Significance & Latest Developments

India–Russia relations continue to evolve in a changing global order. Ahead of President Vladimir Putin’s…

38 mins ago

Which City is Known as the Science City of India? Know About It

India has many cities known for their unique identity, and some of them are famous…

1 hour ago