vietphuon
/

Llama-3.2-1B-Instruct-bnb-4bit-quizgen-241030-1

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3.2-1B-Instruct-bnb-4bit-quizgen-241030-1 / README.md

vietphuon's picture

Update README.md

61d2375 verified 9 days ago

|

history blame contribute delete

2.86 kB

	---
	base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	---
	DATASET
	------------------------------
	- What's new?: Use the version 3.2 of dataset (Langfuse + AWS) that has better quality:
	- Remove all the 10, 15 question count, just focus on 5 question count
	- Fix all the Vietnamese quiz (make sure the output is Vietnamese)
	- Fix some lazy duplicated topic (Biglead, Computing)
	- Remove Paragraph, replace Paragraph with MCQ for all data points
	- Train using the default training config (60 step, linear lr)

	TRAINING
	------------------------------
	- Overview:
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64952a1e5ba8e6c66e1a0fa8/QBR1IUoD7REKoGG_kJtRS.png)
	- Use low rank 8 to avoid overfitting and keep the generalization of model

	Step Training Loss
	1 1.216600
	2 1.181100
	3 1.236900
	4 1.157100
	5 1.184100
	6 1.103500
	7 1.150900
	8 1.112900
	9 1.074600
	10 1.095700
	11 0.966400
	12 0.977000
	13 1.004500
	14 0.931500
	15 0.869900
	16 0.886300
	17 0.900000
	18 0.792500
	19 0.814200
	20 0.808900
	21 0.815200
	22 0.771100
	23 0.800000
	24 0.782500
	25 0.772700
	26 0.698300
	27 0.759500
	28 0.718500
	29 0.711400
	30 0.759400
	31 0.717000
	32 0.708700
	33 0.726800
	34 0.724500
	35 0.747800
	36 0.715600
	37 0.708100
	38 0.648300
	39 0.677900
	40 0.685600
	41 0.726100
	42 0.687300
	43 0.663100
	44 0.628600
	45 0.663300
	46 0.683500
	47 0.673800
	48 0.651100
	49 0.683700
	50 0.702400
	51 0.664400
	52 0.671800
	53 0.673000
	54 0.704000
	55 0.621100
	56 0.668200
	57 0.686000
	58 0.639500
	59 0.665400
	60 0.680900

	- 4757.667 seconds used for training.
	- 79.29 minutes used for training.
	- Peak reserved memory = 13.857 GB.
	- Peak reserved memory for training = 12.73 GB.
	- Peak reserved memory % of max memory = 93.959 %.
	- Peak reserved memory for training % of max memory = 86.317 %.
	- Final loss = 0.680900
	- View full training here: https://wandb.ai/vietphuongnguyen2602-rockship/huggingface/runs/ns2ym0hr


	FINAL BENCHMARKING
	------------------------------
	- Time to First Token (TTFT): 0.002s
	- Time Per Output Token (TPOT): 37.15ms/token
	- Throughput (token/s): 27.00token/s
	- Average Token Latency (ms/token): 37.21ms/token
	- Total Generation Time: 19.171s
	- Input Tokenization Time: 0.008s
	- Input Tokens: 1909
	- Output Tokens: 517
	- Total Tokens: 2426
	- Memory Usage (GPU): 1.38GB

	# Uploaded model

	- Developed by: vietphuon
	- License: apache-2.0
	- Finetuned from model : unsloth/Llama-3.2-1B-Instruct-bnb-4bit

	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)