--- base_model: unsloth/Mistral-Nemo-Base-2407 library_name: peft license: apache-2.0 tags: - axolotl - generated_from_trainer model-index: - name: adventure-nemo-ws results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.4.1` ```yaml # python -m axolotl.cli.preprocess adventure-nemo.yml # accelerate launch -m axolotl.cli.train adventure-nemo.yml # python -m axolotl.cli.merge_lora adventure-nemo.yml base_model: unsloth/Mistral-Nemo-Base-2407 model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: true strict: false sequence_len: 8192 # 99% vram bf16: auto fp16: tf32: false flash_attention: true special_tokens: # Data dataset_prepared_path: last_run_prepared datasets: - path: ColumbidAI/adventure-8k type: completion warmup_steps: 10 shuffle_merged_datasets: true save_safetensors: true saves_per_epoch: 4 save_total_limit: 2 # WandB wandb_project: Nemo-A wandb_entity: # Iterations num_epochs: 1 # Output output_dir: ./adventure-command-r-workspace hub_model_id: ToastyPigeon/adventure-nemo-ws hub_strategy: "all_checkpoints" # Sampling sample_packing: true pad_to_sequence_len: true # Batching gradient_accumulation_steps: 1 micro_batch_size: 4 gradient_checkpointing: 'unsloth' gradient_checkpointing_kwargs: use_reentrant: true #unsloth_cross_entropy_loss: true #unsloth_lora_mlp: true #unsloth_lora_qkv: true #unsloth_lora_o: true # Evaluation val_set_size: 0.005 evals_per_epoch: 5 eval_table_size: eval_max_new_tokens: 256 eval_sample_packing: false eval_batch_size: 1 # LoRA adapter: qlora lora_model_dir: lora_r: 64 lora_alpha: 32 lora_dropout: 0.125 lora_target_linear: lora_fan_in_fan_out: lora_target_modules: - gate_proj - down_proj - up_proj - q_proj - v_proj - k_proj - o_proj lora_modules_to_save: # Optimizer optimizer: paged_adamw_8bit # adamw_8bit lr_scheduler: cosine learning_rate: 0.00025 lr_scheduler: cosine_with_min_lr lr_scheduler_kwargs: min_lr: 0.000025 weight_decay: 0.01 max_grad_norm: 20.0 # Misc train_on_inputs: false group_by_length: false early_stopping_patience: local_rank: logging_steps: 1 xformers_attention: debug: #deepspeed: /workspace/axolotl/deepspeed_configs/zero3.json # previously blank fsdp: fsdp_config: plugins: - axolotl.integrations.liger.LigerPlugin liger_rope: true liger_rms_norm: true liger_swiglu: true liger_fused_linear_cross_entropy: true ```

# adventure-nemo-ws This model is a fine-tuned version of [unsloth/Mistral-Nemo-Base-2407](https://huggingface.co/unsloth/Mistral-Nemo-Base-2407) on the None dataset. It achieves the following results on the evaluation set: - Loss: 2.1587 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.00025 - train_batch_size: 4 - eval_batch_size: 1 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine_with_min_lr - lr_scheduler_warmup_steps: 10 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 1.9422 | 0.0011 | 1 | 2.3948 | | 1.8427 | 0.2011 | 189 | 2.2440 | | 1.6786 | 0.4021 | 378 | 2.2143 | | 1.9847 | 0.6032 | 567 | 2.1799 | | 1.8358 | 0.8043 | 756 | 2.1587 | ### Framework versions - PEFT 0.12.0 - Transformers 4.45.0.dev0 - Pytorch 2.3.1+cu121 - Datasets 2.21.0 - Tokenizers 0.19.1