Description

This tokenizer is designed for Moroccan Darija, a dialectal variety of Arabic (ISO code: ary).
It has been trained using the Byte Pair Encoding (BPE) algorithm on the dataset: atlasia/AL-Atlas-Moroccan-Darija-Pretraining-Dataset.

Features

  • Tokenizes Moroccan Darija text efficiently, see Moroccan darija leaderboard.
  • Provides robust handling of dialectal variations and specific features of Moroccan Darija.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train BounharAbdelaziz/Moroccan-Darija-Tokenizer