Built with Axolotl

See axolotl config

axolotl version: 0.4.1

base_model: codellama/CodeLlama-7b-hf
base_model_config: codellama/CodeLlama-7b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true
hub_model_id: EvolCodeLlama-7b

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: mlabonne/Evol-Instruct-Python-1k
    type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 2048
sample_packing: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
eval_steps: 0.01
save_strategy: epoch
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

EvolCodeLlama-7b

This model is a fine-tuned version of codellama/CodeLlama-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3796

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
0.4828 0.0086 1 0.4975
0.4056 0.0343 4 0.4976
0.5046 0.0685 8 0.4973
0.3969 0.1028 12 0.4966
0.3404 0.1370 16 0.4947
0.4645 0.1713 20 0.4896
0.2892 0.2056 24 0.4789
0.2616 0.2398 28 0.4616
0.2586 0.2741 32 0.4430
0.3147 0.3084 36 0.4267
0.3686 0.3426 40 0.4158
0.2935 0.3769 44 0.4084
0.2419 0.4111 48 0.4026
0.2791 0.4454 52 0.3970
0.2381 0.4797 56 0.3922
0.2407 0.5139 60 0.3888
0.2686 0.5482 64 0.3872
0.3673 0.5824 68 0.3880
0.2665 0.6167 72 0.3848
0.3259 0.6510 76 0.3830
0.236 0.6852 80 0.3801
0.2301 0.7195 84 0.3786
0.3573 0.7537 88 0.3766
0.2409 0.7880 92 0.3745
0.3192 0.8223 96 0.3744
0.2652 0.8565 100 0.3720
0.2341 0.8908 104 0.3712
0.3651 0.9251 108 0.3709
0.1667 0.9593 112 0.3714
0.2755 0.9936 116 0.3699
0.2906 1.0254 120 0.3712
0.2079 1.0593 124 0.3708
0.3429 1.0932 128 0.3708
0.3296 1.1271 132 0.3721
0.2231 1.1610 136 0.3707
0.2098 1.1949 140 0.3686
0.2918 1.2288 144 0.3711
0.3803 1.2627 148 0.3676
0.2619 1.2966 152 0.3662
0.2261 1.3305 156 0.3679
0.1954 1.3644 160 0.3689
0.2183 1.3983 164 0.3677
0.2459 1.4322 168 0.3674
0.1979 1.4661 172 0.3669
0.2175 1.5 176 0.3653
0.26 1.5339 180 0.3652
0.2195 1.5678 184 0.3645
0.3344 1.6017 188 0.3645
0.1769 1.6356 192 0.3643
0.1829 1.6695 196 0.3639
0.2343 1.7034 200 0.3649
0.2568 1.7373 204 0.3650
0.1749 1.7712 208 0.3640
0.2118 1.8051 212 0.3628
0.2252 1.8390 216 0.3611
0.2301 1.8729 220 0.3602
0.1884 1.9068 224 0.3602
0.2023 1.9407 228 0.3600
0.2428 1.9746 232 0.3587
0.2413 2.0064 236 0.3583
0.2015 2.0407 240 0.3620
0.2131 2.0749 244 0.3728
0.1768 2.1092 248 0.3834
0.1615 2.1435 252 0.3810
0.1598 2.1777 256 0.3775
0.171 2.2120 260 0.3763
0.1973 2.2463 264 0.3759
0.1407 2.2805 268 0.3758
0.1998 2.3148 272 0.3771
0.1267 2.3490 276 0.3773
0.1526 2.3833 280 0.3782
0.1547 2.4176 284 0.3776
0.1439 2.4518 288 0.3768
0.1565 2.4861 292 0.3757
0.2113 2.5203 296 0.3767
0.1768 2.5546 300 0.3776
0.2366 2.5889 304 0.3792
0.1397 2.6231 308 0.3801
0.3598 2.6574 312 0.3805
0.1296 2.6916 316 0.3803
0.1344 2.7259 320 0.3805
0.2095 2.7602 324 0.3804
0.1646 2.7944 328 0.3800
0.1749 2.8287 332 0.3799
0.1597 2.8630 336 0.3800
0.1602 2.8972 340 0.3799
0.1786 2.9315 344 0.3797
0.1692 2.9657 348 0.3797
0.1887 3.0 352 0.3796

Framework versions

  • PEFT 0.13.0
  • Transformers 4.45.0
  • Pytorch 2.3.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.0
Downloads last month
13
GGUF
Model size
6.74B params
Architecture
llama

5-bit

16-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for ani-kavle/EvolCodeLlama-7b-GGUF

Adapter
(287)
this model