leaderboard_info.md · AMR-KELEG/MLADI at main

Test Set Details

The test set used for evaluation is composed of 1000 sentences geolocated to the 14 most-populated Arab countries (excluding Somalia from which data was scarce). Each sample is annotated by native speakers recruited from 11 different Arab countries, namely: Algeria, Egypt, Iraq, Jordan, Morocco, Palestine, Saudi Arabia, Sudan, Syria, Tunisia, Yemen.

Evaluation Metrics

We compute the precision, recall, and F1 scores for each of the 11 countries (treating each label as a binary classification problem).

Data Access

If you need to access the single-label training sets, and the multi-label development set, please fill the following form: https://forms.gle/t3QTC6ZqyDJBzAau8

Further Notes

The beta version of the leaderboard is running on limited resources, and is not able to evaluate models with a relatively large number of parameters.
Please refer to the paper for more information about how the data was curated and annotated.
We are planning to extend the annotations to include more country-level dialects. If you are interested in helping, please ping us, and we are happy to discuss it further.