metadata
base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
FINAL BENCHMARKING
- Time to First Token (TTFT): 0.001s
- Time Per Output Token (TPOT): 33.26ms/token
- Throughput (token/s): 30.88token/s
- Average Token Latency (ms/token): 33.33ms/token
- Total Generation Time: 13.966s
- Input Tokenization Time: 0.011s
- Input Tokens: 1909
- Output Tokens: 420
- Total Tokens: 2329
- Memory Usage (GPU): 3.38GB
Uploaded model
- Developed by: vietphuon
- License: apache-2.0
- Finetuned from model : unsloth/Llama-3.2-1B-Instruct-bnb-4bit
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.