README / README.md
0-hero's picture
Update README.md
a31d0aa verified
metadata
title: README
emoji: πŸ†
colorFrom: pink
colorTo: indigo
sdk: static
pinned: false

Indic Language Benchmarking for Large Language Models

India is diverse with 22+ languages. This project aims to benchmark the performance of large language models on Indic languages across datasets. Goal is to evaluate a models abilities in understanding, generating, and processing text in these languages.

We currently have 8 languages across 3 datasets, more coming soon

Languages

  • Bengali (bn)
  • Gujarati (gu)
  • Hindi (hi)
  • Kannada (kn)
  • Malayalam (ml)
  • Odiya (or)
  • Tamil (ta)
  • Telugu (te)

Datasets

Code

Eval Harness

We are also trying to build an MMLU dataset with Indian Knowledge. If anyone is interested in contributing, please reach out to Ram, Munish