ruslandev
/

llama-3-70b-tagengo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama-3-70b-tagengo / README.md

ruslandev's picture

Update README.md

1ad6a4b verified 6 months ago

|

history blame contribute delete

2.03 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	base_model: unsloth/llama-3-70b-bnb-4bit
	datasets:
	- lightblue/tagengo-gpt4
	---

	# Uploaded model

	- Developed by: ruslandev
	- License: apache-2.0
	- Finetuned from model : unsloth/llama-3-70b-bnb-4bit

	This model is finetuned on the Tagengo dataset.
	Please note - this model has been created for educational purposes and it needs further training/fine tuning.

	# How to use

	The easiest way to use this model on your own computer is to use the GGUF version of this model ([ruslandev/llama-3-70b-tagengo-GGUF](https://huggingface.co/ruslandev/llama-3-70b-tagengo-GGUF)) using a program such as [llama.cpp](https://github.com/ggerganov/llama.cpp).
	If you want to use this model directly with the Huggingface Transformers stack, I recommend using my framework [gptchain](https://github.com/RuslanPeresy/gptchain).

	```
	git clone https://github.com/RuslanPeresy/gptchain.git
	cd gptchain
	pip install -r requirements-train.txt
	python gptchain.py chat -m ruslandev/llama-3-70b-tagengo \
	--chatml true \
	-q '[{"from": "human", "value": "Из чего состоит нейронная сеть?"}]'
	```

	# Training
	[gptchain](https://github.com/RuslanPeresy/gptchain) framework has been used for training.

	```
	python gptchain.py train -m unsloth/llama-3-70b-bnb-4bit \
	-dn tagengo_gpt4 \
	-sp checkpoints/llama-3-70b-tagengo \
	-hf llama-3-70b-tagengo \
	--max-steps 2400
	```

	# Training hyperparameters

	- learning_rate: 2e-4
	- seed: 3407
	- gradient_accumulation_steps: 4
	- per_device_train_batch_size: 2
	- optimizer: adamw_8bit
	- lr_scheduler_type: linear
	- warmup_steps: 5
	- max_steps: 2400
	- weight_decay: 0.01

	# Training results
	[wandb report](https://api.wandb.ai/links/ruslandev/rilj60ra)

	2400 steps took 7 hours on a single H100

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)