ModernBERT-ar-base-tiny

This model was trained on Fineweb2 Ar sample dataset. The tokenizer was also trained using the same dataset.
See sample code (usage and training) and initial post
Updated: Jan. 12, 2025.

Model description

ModernBERT Arabic (MLM) experiment.

Intended uses & limitations

Educational and explorational uses only. Limited data, not fully trained.

Training and evaluation data

Evaluation on 5% of the data, uses 2 GPUs.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 12
  • eval_batch_size: 12
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 24
  • total_eval_batch_size: 24
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 50000
  • mixed_precision_training: Native AMP

Framework versions

  • Transformers 4.49.0.dev0
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
108
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for akhooli/ModernBERT-ar-base-tiny

Finetunes
1 model