--- tags: - generated_from_trainer base_model: Locutusque/TinyMistral-248M-v2 model-index: - name: TinyMistral-v2-Test1/ results: [] datasets: - JeanKaddour/minipile - epfl-llm/guidelines --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.3.0` ```yaml base_model: Locutusque/TinyMistral-248M-v2 model_type: MistralForCausalLM is_mistral_derived_model: true load_in_8bit: false load_in_4bit: false strict: false dataset_processes: 20 datasets: - path: epfl-llm/guidelines type: completion field: clean_text - path: JeanKaddour/minipile type: completion field: text dataset_prepared_path: TinyMistral-FFT-data val_set_size: 0.001 output_dir: ./TinyMistral-FFT sequence_len: 2048 sample_packing: false pad_to_sequence_len: true adapter: lora_model_dir: lora_r: lora_alpha: lora_dropout: lora_target_linear: lora_fan_in_fan_out: # wandb configuration wandb_project: TinyMistral-FFT wandb_watch: wandb_run_id: wandb_log_model: gradient_accumulation_steps: 8 micro_batch_size: 1 num_epochs: 1 optimizer: paged_adamw_32bit lr_scheduler: constant cosine_min_lr_ratio: learning_rate: 0.00005 train_on_inputs: true group_by_length: false bf16: false fp16: false tf32: true gradient_checkpointing: false early_stopping_patience: resume_from_checkpoint: auto_resume_from_checkpoints: false local_rank: logging_steps: 1 xformers_attention: flash_attention: false flash_attn_cross_entropy: false flash_attn_rms_norm: true flash_attn_fuse_qkv: false flash_attn_fuse_mlp: true warmup_steps: 10 evals_per_epoch: 100 # eval_steps: 10 eval_table_size: saves_per_epoch: 50 debug: deepspeed: #deepspeed/zero2.json # multi-gpu only weight_decay: 0 # tokens: special_tokens: bos_token: "<|bos|>" eos_token: "<|endoftext|>" unk_token: "" ```

# TinyMistral-StructureEvaluator This model was further trained on the epfl-llm/guidelines and JeanKaddour/minipile datasets. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - training_steps: 39460 ### Training results ### Framework versions - Transformers 4.37.0.dev0 - Pytorch 2.0.1+cu117 - Datasets 2.15.0 - Tokenizers 0.15.0