stablelm-2-1.6-dpo-disticoder-v0.1

This model is a fine-tuned version of plaguss/stablelm-2-1_6-sft-disticoder-v01 on the argilla/DistiCoder-dpo-binarized-train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7398
  • Rewards/chosen: -0.0026
  • Rewards/rejected: -0.0002
  • Rewards/accuracies: 0.4902
  • Rewards/margins: -0.0024
  • Logps/rejected: -359.7791
  • Logps/chosen: -297.9016
  • Logits/rejected: -0.9458
  • Logits/chosen: -0.9673

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7397 1.0 288 0.7419 -0.0098 -0.0101 0.5 0.0003 -359.7990 -297.9159 -0.9478 -0.9696
0.718 2.0 576 0.7291 0.0095 -0.0100 0.5117 0.0194 -359.7986 -297.8773 -0.9464 -0.9679
0.6923 3.0 864 0.7398 -0.0026 -0.0002 0.4902 -0.0024 -359.7791 -297.9016 -0.9458 -0.9673

Framework versions

  • PEFT 0.8.2
  • Transformers 4.37.2
  • Pytorch 2.1.1+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.2
Downloads last month
5
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for plaguss/stablelm-2-1.6-dpo-disticoder-v0.1