metadata
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
base_model: unsloth/llama-3-70b-bnb-4bit
datasets:
- lightblue/tagengo-gpt4
Uploaded model
- Developed by: ruslandev
- License: apache-2.0
- Finetuned from model : unsloth/llama-3-70b-bnb-4bit
This model is finetuned on the Tagengo dataset. Please note - this model has been created for educational purposes and it needs further training/fine tuning.
How to use
The easiest way to use this model on your own computer is to use the GGUF version of this model (ruslandev/llama-3-70b-tagengo-GGUF) using a program such as llama.cpp. If you want to use this model directly with the Huggingface Transformers stack, I recommend using my framework gptchain.
git clone https://github.com/RuslanPeresy/gptchain.git
cd gptchain
pip install -r requirements-train.txt
python gptchain.py chat -m ruslandev/llama-3-70b-tagengo \
--chatml true \
-q '[{"from": "human", "value": "Из чего состоит нейронная сеть?"}]'
Training
gptchain framework has been used for training.
python gptchain.py train -m unsloth/llama-3-70b-bnb-4bit \
-dn tagengo_gpt4 \
-sp checkpoints/llama-3-70b-tagengo \
-hf llama-3-70b-tagengo \
--max-steps 2400
Training hyperparameters
- learning_rate: 2e-4
- seed: 3407
- gradient_accumulation_steps: 4
- per_device_train_batch_size: 2
- optimizer: adamw_8bit
- lr_scheduler_type: linear
- warmup_steps: 5
- max_steps: 2400
- weight_decay: 0.01
Training results
2400 steps took 7 hours on a single H100