|
## Test Set Details |
|
The test set used for evaluation is composed of 1000 sentences geolocated to the 14 most-populated Arab countries (excluding Somalia from which data was scarce). Each sample is annotated by native speakers recruited from 11 different Arab countries, namely: Algeria, Egypt, Iraq, Jordan, Morocco, Palestine, Saudi Arabia, Sudan, Syria, Tunisia, Yemen. |
|
|
|
## Evaluation Metrics |
|
We compute the precision, recall, and F1 scores for each of the 11 countries (treating each label as a binary classification problem). |
|
|
|
## Data Access |
|
If you need to access the single-label training sets, and the multi-label development set, please fill the following form: https://forms.gle/t3QTC6ZqyDJBzAau8 |
|
|
|
#### Further Notes |
|
* The beta version of the leaderboard is running on limited resources, and is not able to evaluate models with a relatively large number of parameters. |
|
* Please refer to the [paper](https://aclanthology.org/2024.arabicnlp-1.79/) for more information about how the data was curated and annotated. |
|
* We are planning to extend the annotations to include more country-level dialects. If you are interested in helping, please ping us, and we are happy to discuss it further. |