vutuka
/

Llama-3.1-8B-Instruct-African-Ultrachat-GGUF

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3.1-8B-Instruct-African-Ultrachat-GGUF / README.md

Svngoku's picture

Update README.md

f367777 verified 7 months ago

|

history blame contribute delete

2.29 kB

	---
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- gguf
	---

	# Llama 3.1-8B Instruct African-Ultrachat Quantize

	- Developed by: vutuka
	- License: apache-2.0
	- Finetuned from model : meta-llama/meta-llama-3.1-8b-instruct
	- Max Content Length : `8192`
	- Max Steps : `800`
	- Training Time : `02h-22min-08s`
	- Setup :
	- `1 x RTX A6000`
	- `16 vCPU`
	- `58 GB RAM`
	- `150 GB Storage`


	## Tokenizer & Chat Format

	```py
	from unsloth.chat_templates import get_chat_template

	tokenizer = get_chat_template(
	tokenizer,
	chat_template = "llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
	mapping={
	"role": "role",
	"content": "content",
	"user": "",
	"assistant": "",
	}
	)

	def formatting_prompts_func(examples):
	convos = examples["messages"]
	texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
	return { "text" : texts, }
	pass
	```

	## Trainer

	```py
	trainer = SFTTrainer(
	model = model,
	tokenizer = tokenizer,
	train_dataset = shuffled_dataset,
	dataset_text_field = "text",
	max_seq_length = max_seq_length,
	dataset_num_proc = 2,
	packing = False, # Can make training 5x faster for short sequences.
	args = TrainingArguments(
	per_device_train_batch_size = 2,
	gradient_accumulation_steps = 4,
	warmup_steps = 5,
	max_steps = 800,
	do_eval=True,
	learning_rate = 3e-4,
	log_level="debug",
	#fp16 = not is_bfloat16_supported(),
	bf16 = True,
	logging_steps = 10,
	optim = "adamw_8bit",
	weight_decay = 0.01,
	lr_scheduler_type = "linear",
	seed = 3407,
	output_dir = "outputs",
	report_to='wandb',
	warmup_ratio=0.3,
	),
	)
	```

	## Inference with Llama CPP



	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)