vietphuon
/

Llama-3.2-1B-Instruct-bnb-4bit-alpaca-then-quizgen-241016-1

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3.2-1B-Instruct-bnb-4bit-alpaca-then-quizgen-241016-1 / README.md

vietphuon's picture

Update README.md

ececa2d verified 15 days ago

|

1.02 kB

	---
	base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	---

	FINAL BENCHMARKING
	------------------------------
	- Time to First Token (TTFT): 0.001s
	- Time Per Output Token (TPOT): 41.83ms/token
	- Throughput (token/s): 24.35token/s
	- Average Token Latency (ms/token): 41.92ms/token
	- Total Generation Time: 18.427s
	- Input Tokenization Time: 0.009s
	- Input Tokens: 1909
	- Output Tokens: 443
	- Total Tokens: 2352
	- Memory Usage (GPU): 3.38GB

	# Uploaded model

	- Developed by: vietphuon
	- License: apache-2.0
	- Finetuned from model : unsloth/Llama-3.2-1B-Instruct-bnb-4bit

	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)