MLADI / leaderboard_info.md
AMR-KELEG's picture
Login to the Hub first
fa481ab

A newer version of the Streamlit SDK is available: 1.46.0

Upgrade

Test Set Details

The test set used for evaluation is composed of 1000 sentences geolocated to the 14 most-populated Arab countries (excluding Somalia from which data was scarce). Each sample is annotated by native speakers recruited from 11 different Arab countries, namely: Algeria, Egypt, Iraq, Jordan, Morocco, Palestine, Saudi Arabia, Sudan, Syria, Tunisia, Yemen.

Evaluation Metrics

We compute the precision, recall, and F1 scores for each of the 11 countries (treating each label as a binary classification problem).

Data Access

If you need to access the single-label training sets, and the multi-label development set, please fill the following form: https://forms.gle/t3QTC6ZqyDJBzAau8

Further Notes

  • The beta version of the leaderboard is running on limited resources, and is not able to evaluate models with a relatively large number of parameters.
  • Please refer to the paper for more information about how the data was curated and annotated.
  • We are planning to extend the annotations to include more country-level dialects. If you are interested in helping, please ping us, and we are happy to discuss it further.