TinyLlama-1.1B-Chat-v1.0-sft-chat_threads

This model is a fine-tuned version of mjschock/TinyLlama-1.1B-Chat-v1.0 on the mjschock/chat_threads dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5586
  • Bleu: 0.7572
  • Precisions: 0.7641
  • Brevity Penalty: 0.9983
  • Length Ratio: 0.9986
  • Translation Length: 582.3552
  • Reference Length: 582.9104
  • Meteor: 0.7364
  • Rouge1: 0.7900
  • Rouge2: 0.5570
  • Rougel: 0.7250
  • Rougelsum: 0.7838

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Bleu Precisions Brevity Penalty Length Ratio Translation Length Reference Length Meteor Rouge1 Rouge2 Rougel Rougelsum
No log 0 0 0.8976 0.6391 0.6567 0.9934 0.9936 579.7720 582.9104 0.6775 0.6912 0.3881 0.5809 0.6813
0.7612 0.9630 13 0.7168 0.6941 0.7056 0.9969 0.9973 581.2681 582.9104 0.7030 0.7375 0.4604 0.6572 0.7281
0.6321 2.0 27 0.5992 0.7420 0.7498 0.9981 0.9981 582.0161 582.9104 0.7312 0.7780 0.5342 0.7069 0.7720
0.5738 2.8889 39 0.5586 0.7572 0.7641 0.9983 0.9986 582.3552 582.9104 0.7364 0.7900 0.5570 0.7250 0.7838

Framework versions

  • PEFT 0.13.2
  • Transformers 4.44.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for mjschock/TinyLlama-1.1B-Chat-v1.0-sft-chat_threads

Adapter
(1)
this model