---
base_model: NousResearch/Meta-Llama-3-8B
library_name: peft
license: other
tags:
- generated_from_trainer
model-index:
- name: outputs/llama3-8b-ht-v1-2
results: []
---
[](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config
axolotl version: `0.4.1`
```yaml
base_model: NousResearch/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: QTeam/htxllama_1
type: alpaca
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/llama3-8b-ht-v1-2
sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save:
- embed_tokens
- lm_head
wandb_project: "ft-llama3-8b-v1"
wandb_entity: "htxqteam1-htx"
wandb_watch: "all"
wandb_name:
wandb_log_model: "never"
gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
```
# outputs/llama3-8b-ht-v1-2
This model is a fine-tuned version of [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 2.5036
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- total_train_batch_size: 6
- total_eval_batch_size: 6
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 10
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 2.1533 | 0.0161 | 1 | 2.1831 |
| 1.6416 | 0.2581 | 16 | 1.7333 |
| 1.6154 | 0.5161 | 32 | 1.6458 |
| 1.5155 | 0.7742 | 48 | 1.5807 |
| 1.5359 | 1.0323 | 64 | 1.5371 |
| 1.0746 | 1.2581 | 80 | 1.5888 |
| 1.0806 | 1.5161 | 96 | 1.5696 |
| 1.0348 | 1.7742 | 112 | 1.5536 |
| 1.0769 | 2.0323 | 128 | 1.5341 |
| 0.6608 | 2.2581 | 144 | 1.6201 |
| 0.6918 | 2.5161 | 160 | 1.6185 |
| 0.7203 | 2.7742 | 176 | 1.6154 |
| 0.7172 | 3.0323 | 192 | 1.6202 |
| 0.3914 | 3.2581 | 208 | 1.7162 |
| 0.4111 | 3.5161 | 224 | 1.7114 |
| 0.4091 | 3.7742 | 240 | 1.7177 |
| 0.4103 | 4.0323 | 256 | 1.7191 |
| 0.1996 | 4.2581 | 272 | 1.8387 |
| 0.1932 | 4.5161 | 288 | 1.8439 |
| 0.2185 | 4.7742 | 304 | 1.8510 |
| 0.2221 | 5.0323 | 320 | 1.8515 |
| 0.0968 | 5.2581 | 336 | 2.0317 |
| 0.0937 | 5.5161 | 352 | 2.0138 |
| 0.0973 | 5.7742 | 368 | 2.0274 |
| 0.083 | 6.0323 | 384 | 2.0257 |
| 0.0385 | 6.2581 | 400 | 2.1731 |
| 0.0411 | 6.5161 | 416 | 2.2114 |
| 0.0446 | 6.7742 | 432 | 2.2080 |
| 0.0426 | 7.0323 | 448 | 2.2194 |
| 0.0186 | 7.2581 | 464 | 2.4007 |
| 0.0186 | 7.5161 | 480 | 2.3837 |
| 0.0217 | 7.7742 | 496 | 2.3915 |
| 0.0201 | 8.0323 | 512 | 2.3953 |
| 0.0137 | 8.2581 | 528 | 2.4732 |
| 0.0158 | 8.5161 | 544 | 2.4896 |
| 0.0145 | 8.7742 | 560 | 2.4928 |
| 0.0145 | 9.0323 | 576 | 2.4964 |
| 0.0135 | 9.2581 | 592 | 2.5030 |
| 0.0149 | 9.5161 | 608 | 2.5036 |
### Framework versions
- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1