pythia_160m_sft

This model is a fine-tuned version of EleutherAI/pythia-160m on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9831

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.2935 0.0889 100 2.1426
2.153 0.1778 200 2.0977
2.1432 0.2667 300 2.0771
2.1131 0.3556 400 2.0633
2.0885 0.4444 500 2.0510
2.0956 0.5333 600 2.0403
2.0647 0.6222 700 2.0354
2.0498 0.7111 800 2.0273
2.0317 0.8 900 2.0202
2.0226 0.8889 1000 2.0150
1.992 0.9778 1100 2.0114
1.9639 1.0667 1200 2.0088
1.9302 1.1556 1300 2.0051
1.9381 1.2444 1400 2.0028
1.9595 1.3333 1500 2.0009
1.9325 1.4222 1600 1.9998
1.9481 1.5111 1700 1.9981
1.9572 1.6 1800 1.9956
1.9456 1.6889 1900 1.9944
1.9565 1.7778 2000 1.9922
1.9507 1.8667 2100 1.9905
1.9247 1.9556 2200 1.9881
1.8998 2.0444 2300 1.9874
1.9102 2.1333 2400 1.9873
1.8842 2.2222 2500 1.9876
1.876 2.3111 2600 1.9863
1.9001 2.4 2700 1.9856
1.8725 2.4889 2800 1.9859
1.868 2.5778 2900 1.9845
1.8803 2.6667 3000 1.9844
1.9002 2.7556 3100 1.9838
1.8941 2.8444 3200 1.9839
1.8548 2.9333 3300 1.9831

Framework versions

  • Transformers 4.46.2
  • Pytorch 2.5.1+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
35
Safetensors
Model size
162M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for koshirowada/pythia_160m_sft

Finetuned
(77)
this model
Finetunes
1 model

Dataset used to train koshirowada/pythia_160m_sft