Built with Axolotl

See axolotl config

axolotl version: 0.4.1

base_model: alpindale/Mistral-7B-v0.2-hf
tokenizer_type: AutoTokenizer
is_mistral_derived_model: true
load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: json
    data_files: hidden_pretraining_manners.jsonl
    ds_type: json
    type: completion


dataset_prepared_path: last_run_prepared
output_dir: ./army-pretraining

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true
shuffle_merged_datasets: true

wandb_project: mistral-army
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 6
micro_batch_size: 2
eval_batch_size: 1
num_epochs: 11
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.000020
weight_decay: 0
# Gradient clipping max norm
max_grad_norm: 1.0
noisy_embedding_alpha: 0
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: unsloth
early_stopping_patience:
resume_from_checkpoint: 
logging_steps: 1
xformers_attention:
flash_attention: true

chat_template: chatml

warmup_ratio: 0.5
auto_resume_from_checkpoints: false
#warmup_ratio: 0.5
eval_steps: 10
saves_per_epoch: 1
eval_sample_packing: false
save_total_limit: 3
debug:
deepspeed: deepspeed_configs/zero2.json
special_tokens:
  pad_token: "<|end_of_text|>"

pretrained base for Mannerstral 7b, only use if you are finetuning something on top of it.

Downloads last month
27
GGUF
Model size
7.24B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Heralax/Mannerstral-base

Quantized
(38)
this model
Quantizations
1 model