--- language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - llama - trl base_model: unsloth/llama-3-70b-bnb-4bit datasets: - lightblue/tagengo-gpt4 --- # Uploaded model - **Developed by:** ruslandev - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3-70b-bnb-4bit This model is finetuned on the Tagengo dataset. Please note - this model has been created for educational purposes and it needs further training/fine tuning. # How to use The easiest way to use this model on your own computer is to use the GGUF version of this model ([ruslandev/llama-3-70b-tagengo-GGUF](https://huggingface.co/ruslandev/llama-3-70b-tagengo-GGUF)) using a program such as [llama.cpp](https://github.com/ggerganov/llama.cpp). If you want to use this model directly with the Huggingface Transformers stack, I recommend using my framework [gptchain](https://github.com/RuslanPeresy/gptchain). ``` git clone https://github.com/RuslanPeresy/gptchain.git cd gptchain pip install -r requirements-train.txt python gptchain.py chat -m ruslandev/llama-3-70b-tagengo \ --chatml true \ -q '[{"from": "human", "value": "Из чего состоит нейронная сеть?"}]' ``` # Training [gptchain](https://github.com/RuslanPeresy/gptchain) framework has been used for training. ``` python gptchain.py train -m unsloth/llama-3-70b-bnb-4bit \ -dn tagengo_gpt4 \ -sp checkpoints/llama-3-70b-tagengo \ -hf llama-3-70b-tagengo \ --max-steps 2400 ``` # Training hyperparameters - learning_rate: 2e-4 - seed: 3407 - gradient_accumulation_steps: 4 - per_device_train_batch_size: 2 - optimizer: adamw_8bit - lr_scheduler_type: linear - warmup_steps: 5 - max_steps: 2400 - weight_decay: 0.01 # Training results [wandb report](https://api.wandb.ai/links/ruslandev/rilj60ra) 2400 steps took 7 hours on a single H100 [](https://github.com/unslothai/unsloth)