xxxxxccc commited on
Commit
784d61e
·
verified ·
1 Parent(s): bbd306e

Model save

Browse files
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: unsloth/Mistral-Nemo-Base-2407-bnb-4bit
3
+ datasets:
4
+ - generator
5
+ library_name: peft
6
+ license: apache-2.0
7
+ tags:
8
+ - trl
9
+ - sft
10
+ - unsloth
11
+ - generated_from_trainer
12
+ model-index:
13
+ - name: mistrial_nemo_output
14
+ results: []
15
+ ---
16
+
17
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
+ should probably proofread and complete it, then remove this comment. -->
19
+
20
+ # mistrial_nemo_output
21
+
22
+ This model is a fine-tuned version of [unsloth/Mistral-Nemo-Base-2407-bnb-4bit](https://huggingface.co/unsloth/Mistral-Nemo-Base-2407-bnb-4bit) on the generator dataset.
23
+ It achieves the following results on the evaluation set:
24
+ - Loss: 1.5118
25
+
26
+ ## Model description
27
+
28
+ More information needed
29
+
30
+ ## Intended uses & limitations
31
+
32
+ More information needed
33
+
34
+ ## Training and evaluation data
35
+
36
+ More information needed
37
+
38
+ ## Training procedure
39
+
40
+ ### Training hyperparameters
41
+
42
+ The following hyperparameters were used during training:
43
+ - learning_rate: 0.0002
44
+ - train_batch_size: 4
45
+ - eval_batch_size: 4
46
+ - seed: 100
47
+ - distributed_type: multi-GPU
48
+ - num_devices: 2
49
+ - gradient_accumulation_steps: 8
50
+ - total_train_batch_size: 64
51
+ - total_eval_batch_size: 8
52
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
+ - lr_scheduler_type: cosine
54
+ - num_epochs: 1.0
55
+
56
+ ### Training results
57
+
58
+ | Training Loss | Epoch | Step | Validation Loss |
59
+ |:-------------:|:------:|:----:|:---------------:|
60
+ | 1.7287 | 0.0516 | 20 | 1.7033 |
61
+ | 1.6508 | 0.1033 | 40 | 1.6490 |
62
+ | 1.6242 | 0.1549 | 60 | 1.6253 |
63
+ | 1.6216 | 0.2066 | 80 | 1.6089 |
64
+ | 1.619 | 0.2582 | 100 | 1.5958 |
65
+ | 1.5579 | 0.3099 | 120 | 1.5842 |
66
+ | 1.5578 | 0.3615 | 140 | 1.5739 |
67
+ | 1.5515 | 0.4132 | 160 | 1.5641 |
68
+ | 1.5739 | 0.4648 | 180 | 1.5550 |
69
+ | 1.5669 | 0.5165 | 200 | 1.5460 |
70
+ | 1.5601 | 0.5681 | 220 | 1.5380 |
71
+ | 1.5392 | 0.6198 | 240 | 1.5310 |
72
+ | 1.5321 | 0.6714 | 260 | 1.5251 |
73
+ | 1.5326 | 0.7230 | 280 | 1.5201 |
74
+ | 1.5197 | 0.7747 | 300 | 1.5165 |
75
+ | 1.5229 | 0.8263 | 320 | 1.5142 |
76
+ | 1.4988 | 0.8780 | 340 | 1.5127 |
77
+ | 1.5044 | 0.9296 | 360 | 1.5119 |
78
+ | 1.5105 | 0.9813 | 380 | 1.5118 |
79
+
80
+
81
+ ### Framework versions
82
+
83
+ - PEFT 0.12.1.dev0
84
+ - Transformers 4.45.0.dev0
85
+ - Pytorch 2.4.0+cu121
86
+ - Datasets 2.21.0
87
+ - Tokenizers 0.19.1
runs/Sep02_00-06-57_autodl-container-a3d5118ffa-b551dd99/events.out.tfevents.1725206856.autodl-container-a3d5118ffa-b551dd99.1032.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a5e66600b289df2c25e620393086692d9135dee0c48a84611673bf363656bffd
3
- size 27249
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cac56eca9e4e0cbeb03b876eeda8a6f438c025834fdb41fd78b8ebda76d6b02
3
+ size 27603