metadata

base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - trl

DATASET

What's new?: Use the version 3.2 of dataset (Langfuse + AWS) that has better quality:
- Remove all the 10, 15 question count, just focus on 5 question count
- Fix all the Vietnamese quiz (make sure the output is Vietnamese)
- Fix some lazy duplicated topic (Biglead, Computing)
- Remove Paragraph, replace Paragraph with MCQ for all data points
- Train using the default training config (60 step, linear lr)

TRAINING

Overview:
Use low rank 8 to avoid overfitting and keep the generalization of model

Step Training Loss 1 1.216600 2 1.181100 3 1.236900 4 1.157100 5 1.184100 6 1.103500 7 1.150900 8 1.112900 9 1.074600 10 1.095700 11 0.966400 12 0.977000 13 1.004500 14 0.931500 15 0.869900 16 0.886300 17 0.900000 18 0.792500 19 0.814200 20 0.808900 21 0.815200 22 0.771100 23 0.800000 24 0.782500 25 0.772700 26 0.698300 27 0.759500 28 0.718500 29 0.711400 30 0.759400 31 0.717000 32 0.708700 33 0.726800 34 0.724500 35 0.747800 36 0.715600 37 0.708100 38 0.648300 39 0.677900 40 0.685600 41 0.726100 42 0.687300 43 0.663100 44 0.628600 45 0.663300 46 0.683500 47 0.673800 48 0.651100 49 0.683700 50 0.702400 51 0.664400 52 0.671800 53 0.673000 54 0.704000 55 0.621100 56 0.668200 57 0.686000 58 0.639500 59 0.665400 60 0.680900

4757.667 seconds used for training.
79.29 minutes used for training.
Peak reserved memory = 13.857 GB.
Peak reserved memory for training = 12.73 GB.
Peak reserved memory % of max memory = 93.959 %.
Peak reserved memory for training % of max memory = 86.317 %.
Final loss = 0.680900
View full training here: https://wandb.ai/vietphuongnguyen2602-rockship/huggingface/runs/ns2ym0hr

FINAL BENCHMARKING

Time to First Token (TTFT): 0.002s
Time Per Output Token (TPOT): 37.15ms/token
Throughput (token/s): 27.00token/s
Average Token Latency (ms/token): 37.21ms/token
Total Generation Time: 19.171s
Input Tokenization Time: 0.008s
Input Tokens: 1909
Output Tokens: 517
Total Tokens: 2426
Memory Usage (GPU): 1.38GB

Uploaded model

Developed by: vietphuon
License: apache-2.0
Finetuned from model : unsloth/Llama-3.2-1B-Instruct-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.