Uploaded model
- Developed by: ruslandev
- License: apache-2.0
- Finetuned from model : unsloth/llama-3-70b-bnb-4bit
This model is finetuned on the Tagengo dataset. Please note - this model has been created for educational purposes and it needs further training/fine tuning.
How to use
The easiest way to use this model on your own computer is to use the GGUF version of this model (ruslandev/llama-3-70b-tagengo-GGUF) using a program such as llama.cpp. If you want to use this model directly with the Huggingface Transformers stack, I recommend using my framework gptchain.
git clone https://github.com/RuslanPeresy/gptchain.git
cd gptchain
pip install -r requirements-train.txt
python gptchain.py chat -m ruslandev/llama-3-70b-tagengo \
--chatml true \
-q '[{"from": "human", "value": "Из чего состоит нейронная сеть?"}]'
Training
gptchain framework has been used for training.
python gptchain.py train -m unsloth/llama-3-70b-bnb-4bit \
-dn tagengo_gpt4 \
-sp checkpoints/llama-3-70b-tagengo \
-hf llama-3-70b-tagengo \
--max-steps 2400
Training hyperparameters
- learning_rate: 2e-4
- seed: 3407
- gradient_accumulation_steps: 4
- per_device_train_batch_size: 2
- optimizer: adamw_8bit
- lr_scheduler_type: linear
- warmup_steps: 5
- max_steps: 2400
- weight_decay: 0.01
Training results
2400 steps took 7 hours on a single H100
- Downloads last month
- 21
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ruslandev/llama-3-70b-tagengo
Base model
unsloth/llama-3-70b-bnb-4bit