mjschock's picture
End of training
8f88f61 verified
|
raw
history blame
2.76 kB
metadata
library_name: peft
base_model: mjschock/TinyLlama-1.1B-Chat-v1.0
tags:
  - trl
  - sft
  - generated_from_trainer
metrics:
  - bleu
  - rouge
model-index:
  - name: TinyLlama-1.1B-Chat-v1.0-sft-chat_threads
    results: []

TinyLlama-1.1B-Chat-v1.0-sft-chat_threads

This model is a fine-tuned version of mjschock/TinyLlama-1.1B-Chat-v1.0 on the mjschock/chat_threads dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5610
  • Bleu: 0.7610
  • Precisions: 0.7680
  • Brevity Penalty: 0.9979
  • Length Ratio: 0.9983
  • Translation Length: 582.2670
  • Reference Length: 582.9104
  • Meteor: 0.7384
  • Rouge1: 0.7948
  • Rouge2: 0.5627
  • Rougel: 0.7303
  • Rougelsum: 0.7881

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Bleu Precisions Brevity Penalty Length Ratio Translation Length Reference Length Meteor Rouge1 Rouge2 Rougel Rougelsum
No log 0 0 0.8976 0.6391 0.6567 0.9934 0.9936 579.7720 582.9104 0.6775 0.6912 0.3881 0.5809 0.6813
0.7629 0.9630 13 0.7182 0.6966 0.7074 0.9976 0.9980 581.6733 582.9104 0.6974 0.7402 0.4682 0.6619 0.7309
0.6338 2.0 27 0.6010 0.7477 0.7559 0.9972 0.9972 581.5872 582.9104 0.7316 0.7866 0.5410 0.7128 0.7804
0.576 2.8889 39 0.5610 0.7610 0.7680 0.9979 0.9983 582.2670 582.9104 0.7384 0.7948 0.5627 0.7303 0.7881

Framework versions

  • PEFT 0.13.2
  • Transformers 4.44.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.19.1