vietphuon
/

Llama-3.2-1B-Instruct-alpaca-then-quizgen-16bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3.2-1B-Instruct-alpaca-then-quizgen-16bit / README.md

vietphuon's picture

Update README.md

a01e6bf verified 15 days ago

|

history blame contribute delete

1.02 kB

	---
	base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	---

	FINAL BENCHMARKING
	------------------------------
	- Time to First Token (TTFT): 0.001s
	- Time Per Output Token (TPOT): 33.26ms/token
	- Throughput (token/s): 30.88token/s
	- Average Token Latency (ms/token): 33.33ms/token
	- Total Generation Time: 13.966s
	- Input Tokenization Time: 0.011s
	- Input Tokens: 1909
	- Output Tokens: 420
	- Total Tokens: 2329
	- Memory Usage (GPU): 3.38GB

	# Uploaded model

	- Developed by: vietphuon
	- License: apache-2.0
	- Finetuned from model : unsloth/Llama-3.2-1B-Instruct-bnb-4bit

	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)