Spaces:

indicbench
/

README

Running

App Files Files Community

README / README.md

0-hero's picture

Update README.md

a31d0aa verified 10 months ago

|

history blame contribute delete

2.43 kB

metadata

title: README
emoji: 🏆
colorFrom: pink
colorTo: indigo
sdk: static
pinned: false

Indic Language Benchmarking for Large Language Models

India is diverse with 22+ languages. This project aims to benchmark the performance of large language models on Indic languages across datasets. Goal is to evaluate a models abilities in understanding, generating, and processing text in these languages.

We currently have 8 languages across 3 datasets, more coming soon

Languages

Bengali (bn)
Gujarati (gu)
Hindi (hi)
Kannada (kn)
Malayalam (ml)
Odiya (or)
Tamil (ta)
Telugu (te)

Datasets

ARC-Challenge: hi, bn, gu, kn, ml, or, ta, te
TruthfulQA: hi, bn, gu, kn, ml, or, ta, te
Hellaswag: hi, bn, gu, kn, ml, or, ta, te

Code

We are also trying to build an MMLU dataset with Indian Knowledge. If anyone is interested in contributing, please reach out to Ram, Munish