tyzhu's picture
End of training
8f5d090 verified
metadata
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
  - generated_from_trainer
datasets:
  - tyzhu/lmind_nq_train6000_eval6489_v1_qa
metrics:
  - accuracy
model-index:
  - name: lmind_nq_train6000_eval6489_v1_qa_5e-5_lora2
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: tyzhu/lmind_nq_train6000_eval6489_v1_qa
          type: tyzhu/lmind_nq_train6000_eval6489_v1_qa
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.597948717948718

lmind_nq_train6000_eval6489_v1_qa_5e-5_lora2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the tyzhu/lmind_nq_train6000_eval6489_v1_qa dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3327
  • Accuracy: 0.5979

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Accuracy Validation Loss
1.7923 1.0 187 0.6128 1.2805
1.2488 2.0 375 0.6168 1.2677
1.1097 3.0 562 0.6162 1.2943
0.9244 4.0 750 0.6126 1.3598
0.7924 5.0 937 0.6089 1.4714
0.6864 6.0 1125 0.6045 1.5761
0.6101 7.0 1312 0.6029 1.6554
0.562 8.0 1500 0.6011 1.7485
0.5015 9.0 1687 0.5998 1.8067
0.4855 10.0 1875 0.5996 1.8643
0.4736 11.0 2062 0.5966 1.9771
0.465 12.0 2250 0.5989 1.9610
0.4603 13.0 2437 0.5982 1.9498
0.4537 14.0 2625 0.5979 2.0510
0.4489 15.0 2812 0.5996 2.0862
0.4488 16.0 3000 0.5995 2.0370
0.4238 17.0 3187 0.5990 2.0638
0.4245 18.0 3375 0.6001 2.0635
0.4241 19.0 3562 0.5988 2.1451
0.4236 20.0 3750 0.6003 2.1509
0.4241 21.0 3937 0.5987 2.1745
0.4239 22.0 4125 0.5991 2.1752
0.4245 23.0 4312 0.5983 2.1659
0.4229 24.0 4500 0.5981 2.2126
0.4059 25.0 4687 0.5997 2.1568
0.4064 26.0 4875 0.5979 2.1777
0.4089 27.0 5062 0.5979 2.2200
0.4099 28.0 5250 0.5976 2.2412
0.4103 29.0 5437 0.5983 2.2093
0.4112 30.0 5625 0.6002 2.2145
0.4113 31.0 5812 0.5990 2.2514
0.4124 32.0 6000 0.5979 2.3170
0.3961 33.0 6187 0.5978 2.2557
0.4002 34.0 6375 0.5979 2.2739
0.3998 35.0 6562 0.5976 2.2498
0.4022 36.0 6750 0.5972 2.3118
0.4038 37.0 6937 0.5970 2.3259
0.404 38.0 7125 0.5973 2.3276
0.4072 39.0 7312 0.5994 2.2854
0.4077 40.0 7500 0.5982 2.3036
0.3943 41.0 7687 0.5987 2.3361
0.3939 42.0 7875 0.5995 2.2148
0.3977 43.0 8062 0.5985 2.3393
0.3988 44.0 8250 0.5983 2.2875
0.402 45.0 8437 0.5995 2.2981
0.4002 46.0 8625 0.5981 2.3163
0.4004 47.0 8812 0.5987 2.3085
0.402 48.0 9000 0.5977 2.3341
0.3895 49.0 9187 0.5984 2.2953
0.3927 49.87 9350 0.5979 2.3327

Framework versions

  • Transformers 4.34.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.14.1