groderg commited on
Commit
7f3c29f
·
verified ·
1 Parent(s): d43b2d7

Evaluation on the test set completed on 2025_02_03.

Browse files
README.md CHANGED
@@ -1,4 +1,100 @@
1
  ---
 
 
 
2
  tags:
3
- - hf-summary-writer
 
 
 
4
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: facebook/dinov2-large
5
  tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: DinoVdrone-large-2025_02_03_31850-bs32_freeze_probs
9
+ results: []
10
  ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # DinoVdrone-large-2025_02_03_31850-bs32_freeze_probs
16
+
17
+ This model is a fine-tuned version of [facebook/dinov2-large](https://huggingface.co/facebook/dinov2-large) on the None dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 0.4512
20
+ - Rmse: 0.1689
21
+ - Mae: 0.1261
22
+ - Kl Divergence: 0.5558
23
+ - Explained Variance: 0.3655
24
+ - Learning Rate: 1e-05
25
+
26
+ ## Model description
27
+
28
+ More information needed
29
+
30
+ ## Intended uses & limitations
31
+
32
+ More information needed
33
+
34
+ ## Training and evaluation data
35
+
36
+ More information needed
37
+
38
+ ## Training procedure
39
+
40
+ ### Training hyperparameters
41
+
42
+ The following hyperparameters were used during training:
43
+ - learning_rate: 0.001
44
+ - train_batch_size: 32
45
+ - eval_batch_size: 32
46
+ - seed: 42
47
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
+ - lr_scheduler_type: linear
49
+ - num_epochs: 150
50
+ - mixed_precision_training: Native AMP
51
+
52
+ ### Training results
53
+
54
+ | Training Loss | Epoch | Step | Validation Loss | Rmse | Mae | Kl Divergence | Explained Variance | Rate |
55
+ |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:-------------:|:------------------:|:------:|
56
+ | No log | 1.0 | 76 | 0.5544 | 0.2605 | 0.2278 | 1.7912 | 0.0699 | 0.001 |
57
+ | No log | 2.0 | 152 | 0.4817 | 0.2007 | 0.1602 | 1.0345 | 0.0726 | 0.001 |
58
+ | No log | 3.0 | 228 | 0.4615 | 0.1801 | 0.1370 | 0.6064 | 0.2707 | 0.001 |
59
+ | No log | 4.0 | 304 | 0.4632 | 0.1837 | 0.1391 | 0.6577 | 0.0892 | 0.001 |
60
+ | No log | 5.0 | 380 | 0.4579 | 0.1769 | 0.1363 | 0.6678 | 0.2999 | 0.001 |
61
+ | No log | 6.0 | 456 | 0.4571 | 0.1766 | 0.1330 | 0.7896 | 0.3050 | 0.001 |
62
+ | 0.4966 | 7.0 | 532 | 0.4586 | 0.1773 | 0.1307 | 0.6493 | 0.3103 | 0.001 |
63
+ | 0.4966 | 8.0 | 608 | 0.4579 | 0.1772 | 0.1319 | 0.9475 | 0.3085 | 0.001 |
64
+ | 0.4966 | 9.0 | 684 | 0.4551 | 0.1746 | 0.1306 | 0.7271 | 0.3136 | 0.001 |
65
+ | 0.4966 | 10.0 | 760 | 0.4582 | 0.1774 | 0.1316 | 0.6882 | 0.2918 | 0.001 |
66
+ | 0.4966 | 11.0 | 836 | 0.4683 | 0.1842 | 0.1372 | 0.3715 | 0.2586 | 0.001 |
67
+ | 0.4966 | 12.0 | 912 | 0.4579 | 0.1764 | 0.1316 | 0.5271 | 0.3086 | 0.001 |
68
+ | 0.4966 | 13.0 | 988 | 0.4559 | 0.1756 | 0.1301 | 0.9168 | 0.3100 | 0.001 |
69
+ | 0.4448 | 14.0 | 1064 | 0.4556 | 0.1749 | 0.1292 | 0.8827 | 0.3229 | 0.001 |
70
+ | 0.4448 | 15.0 | 1140 | 0.4522 | 0.1717 | 0.1262 | 0.7009 | 0.3416 | 0.001 |
71
+ | 0.4448 | 16.0 | 1216 | 0.4556 | 0.1753 | 0.1286 | 1.0038 | 0.3163 | 0.001 |
72
+ | 0.4448 | 17.0 | 1292 | 0.4586 | 0.1775 | 0.1343 | 0.2600 | 0.3205 | 0.001 |
73
+ | 0.4448 | 18.0 | 1368 | 0.5672 | 0.2369 | 0.1638 | 2.0548 | -4.7788 | 0.001 |
74
+ | 0.4448 | 19.0 | 1444 | 0.4529 | 0.1727 | 0.1287 | 0.7115 | 0.3279 | 0.001 |
75
+ | 0.4406 | 20.0 | 1520 | 0.4552 | 0.1746 | 0.1285 | 0.9694 | 0.3204 | 0.001 |
76
+ | 0.4406 | 21.0 | 1596 | 0.4530 | 0.1724 | 0.1282 | 0.7789 | 0.3300 | 0.001 |
77
+ | 0.4406 | 22.0 | 1672 | 0.4503 | 0.1700 | 0.1261 | 0.7369 | 0.3473 | 0.0001 |
78
+ | 0.4406 | 23.0 | 1748 | 0.4535 | 0.1716 | 0.1280 | 0.5027 | 0.3403 | 0.0001 |
79
+ | 0.4406 | 24.0 | 1824 | 0.4502 | 0.1697 | 0.1264 | 0.5968 | 0.3511 | 0.0001 |
80
+ | 0.4406 | 25.0 | 1900 | 0.4504 | 0.1699 | 0.1267 | 0.6215 | 0.3504 | 0.0001 |
81
+ | 0.4406 | 26.0 | 1976 | 0.4510 | 0.1704 | 0.1260 | 0.6568 | 0.3460 | 0.0001 |
82
+ | 0.4334 | 27.0 | 2052 | 0.4498 | 0.1693 | 0.1262 | 0.5748 | 0.3546 | 0.0001 |
83
+ | 0.4334 | 28.0 | 2128 | 0.4506 | 0.1701 | 0.1256 | 0.7001 | 0.3467 | 0.0001 |
84
+ | 0.4334 | 29.0 | 2204 | 0.4505 | 0.1699 | 0.1263 | 0.5840 | 0.3531 | 0.0001 |
85
+ | 0.4334 | 30.0 | 2280 | 0.4506 | 0.1703 | 0.1252 | 0.8101 | 0.3486 | 0.0001 |
86
+ | 0.4334 | 31.0 | 2356 | 0.4508 | 0.1701 | 0.1249 | 0.7416 | 0.3489 | 0.0001 |
87
+ | 0.4334 | 32.0 | 2432 | 0.4502 | 0.1697 | 0.1254 | 0.6402 | 0.3524 | 0.0001 |
88
+ | 0.4289 | 33.0 | 2508 | 0.4511 | 0.1710 | 0.1250 | 0.8411 | 0.3392 | 0.0001 |
89
+ | 0.4289 | 34.0 | 2584 | 0.4515 | 0.1711 | 0.1259 | 0.7204 | 0.3383 | 1e-05 |
90
+ | 0.4289 | 35.0 | 2660 | 0.4502 | 0.1698 | 0.1247 | 0.7355 | 0.3498 | 1e-05 |
91
+ | 0.4289 | 36.0 | 2736 | 0.4509 | 0.1703 | 0.1261 | 0.4990 | 0.3486 | 1e-05 |
92
+ | 0.4289 | 37.0 | 2812 | 0.4500 | 0.1696 | 0.1260 | 0.5451 | 0.3536 | 1e-05 |
93
+
94
+
95
+ ### Framework versions
96
+
97
+ - Transformers 4.48.0
98
+ - Pytorch 2.5.1+cu124
99
+ - Datasets 3.0.2
100
+ - Tokenizers 0.21.0
all_results.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 37.0,
3
+ "eval_explained_variance": 0.3655208349227905,
4
+ "eval_kl_divergence": 0.5558046698570251,
5
+ "eval_loss": 0.451246976852417,
6
+ "eval_mae": 0.12609094381332397,
7
+ "eval_rmse": 0.1689337193965912,
8
+ "eval_runtime": 33.8846,
9
+ "eval_samples_per_second": 23.669,
10
+ "eval_steps_per_second": 0.767,
11
+ "learning_rate": 1e-05,
12
+ "total_flos": 1.318896404308369e+20,
13
+ "train_loss": 0.4464974437295797,
14
+ "train_runtime": 4164.1679,
15
+ "train_samples_per_second": 86.596,
16
+ "train_steps_per_second": 2.738
17
+ }
logs/events.out.tfevents.1738569064.datavisu4 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:741c27adfccb4bbe5c9f4cad0864315b10681a45325b1468dff700248e1929ed
3
- size 26625
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e35403578608b95100241e4a10fca05474c8b45953e6c056aa6aa497d12c9d73
3
+ size 28123
logs/events.out.tfevents.1738573356.datavisu4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:760aeafae17ddec88dc492d7630cf29b3783355af6c6fc4b56d18fb05e395fb8
3
+ size 40
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b96ff726926066a9e71403cadee40b9d1b6668fb09dd397c8e47dbed4ccef1fc
3
  size 1228076728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71bbb7679db9b4309c6556dbad7dddb03732bcacba355dedc979d8d082cda9bd
3
  size 1228076728
test_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 37.0,
3
+ "eval_explained_variance": 0.3655208349227905,
4
+ "eval_kl_divergence": 0.5558046698570251,
5
+ "eval_loss": 0.451246976852417,
6
+ "eval_mae": 0.12609094381332397,
7
+ "eval_rmse": 0.1689337193965912,
8
+ "eval_runtime": 33.8846,
9
+ "eval_samples_per_second": 23.669,
10
+ "eval_steps_per_second": 0.767,
11
+ "learning_rate": 1e-05
12
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 37.0,
3
+ "learning_rate": 1e-05,
4
+ "total_flos": 1.318896404308369e+20,
5
+ "train_loss": 0.4464974437295797,
6
+ "train_runtime": 4164.1679,
7
+ "train_samples_per_second": 86.596,
8
+ "train_steps_per_second": 2.738
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,568 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.4497845768928528,
3
+ "best_model_checkpoint": "/home1/datahome/villien/project_hub/DinoVdeau/models/DinoVdrone-large-2025_02_03_31850-bs32_freeze_probs/checkpoint-2052",
4
+ "epoch": 37.0,
5
+ "eval_steps": 500,
6
+ "global_step": 2812,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_explained_variance": 0.0698663592338562,
14
+ "eval_kl_divergence": 1.791170597076416,
15
+ "eval_loss": 0.5543646216392517,
16
+ "eval_mae": 0.22781437635421753,
17
+ "eval_rmse": 0.26052138209342957,
18
+ "eval_runtime": 36.0517,
19
+ "eval_samples_per_second": 22.218,
20
+ "eval_steps_per_second": 0.721,
21
+ "learning_rate": 0.001,
22
+ "step": 76
23
+ },
24
+ {
25
+ "epoch": 2.0,
26
+ "eval_explained_variance": 0.07257537543773651,
27
+ "eval_kl_divergence": 1.0344740152359009,
28
+ "eval_loss": 0.48171326518058777,
29
+ "eval_mae": 0.16019925475120544,
30
+ "eval_rmse": 0.20066045224666595,
31
+ "eval_runtime": 32.1858,
32
+ "eval_samples_per_second": 24.887,
33
+ "eval_steps_per_second": 0.808,
34
+ "learning_rate": 0.001,
35
+ "step": 152
36
+ },
37
+ {
38
+ "epoch": 3.0,
39
+ "eval_explained_variance": 0.27069091796875,
40
+ "eval_kl_divergence": 0.6063743233680725,
41
+ "eval_loss": 0.4615386724472046,
42
+ "eval_mae": 0.1370290368795395,
43
+ "eval_rmse": 0.18013149499893188,
44
+ "eval_runtime": 22.3828,
45
+ "eval_samples_per_second": 35.786,
46
+ "eval_steps_per_second": 1.162,
47
+ "learning_rate": 0.001,
48
+ "step": 228
49
+ },
50
+ {
51
+ "epoch": 4.0,
52
+ "eval_explained_variance": 0.08923790603876114,
53
+ "eval_kl_divergence": 0.6576936841011047,
54
+ "eval_loss": 0.4632064700126648,
55
+ "eval_mae": 0.1390942633152008,
56
+ "eval_rmse": 0.18370357155799866,
57
+ "eval_runtime": 22.4997,
58
+ "eval_samples_per_second": 35.601,
59
+ "eval_steps_per_second": 1.156,
60
+ "learning_rate": 0.001,
61
+ "step": 304
62
+ },
63
+ {
64
+ "epoch": 5.0,
65
+ "eval_explained_variance": 0.2999000549316406,
66
+ "eval_kl_divergence": 0.6677561402320862,
67
+ "eval_loss": 0.4579373300075531,
68
+ "eval_mae": 0.13632091879844666,
69
+ "eval_rmse": 0.17685972154140472,
70
+ "eval_runtime": 22.4993,
71
+ "eval_samples_per_second": 35.601,
72
+ "eval_steps_per_second": 1.156,
73
+ "learning_rate": 0.001,
74
+ "step": 380
75
+ },
76
+ {
77
+ "epoch": 6.0,
78
+ "eval_explained_variance": 0.30495360493659973,
79
+ "eval_kl_divergence": 0.7895806431770325,
80
+ "eval_loss": 0.4571229815483093,
81
+ "eval_mae": 0.13304883241653442,
82
+ "eval_rmse": 0.176561176776886,
83
+ "eval_runtime": 22.7,
84
+ "eval_samples_per_second": 35.286,
85
+ "eval_steps_per_second": 1.145,
86
+ "learning_rate": 0.001,
87
+ "step": 456
88
+ },
89
+ {
90
+ "epoch": 6.578947368421053,
91
+ "grad_norm": 0.27282145619392395,
92
+ "learning_rate": 0.001,
93
+ "loss": 0.4966,
94
+ "step": 500
95
+ },
96
+ {
97
+ "epoch": 7.0,
98
+ "eval_explained_variance": 0.3103345036506653,
99
+ "eval_kl_divergence": 0.6492787599563599,
100
+ "eval_loss": 0.4586440622806549,
101
+ "eval_mae": 0.13074660301208496,
102
+ "eval_rmse": 0.17730861902236938,
103
+ "eval_runtime": 22.4946,
104
+ "eval_samples_per_second": 35.608,
105
+ "eval_steps_per_second": 1.156,
106
+ "learning_rate": 0.001,
107
+ "step": 532
108
+ },
109
+ {
110
+ "epoch": 8.0,
111
+ "eval_explained_variance": 0.30848240852355957,
112
+ "eval_kl_divergence": 0.9475247263908386,
113
+ "eval_loss": 0.45786425471305847,
114
+ "eval_mae": 0.13187319040298462,
115
+ "eval_rmse": 0.17719128727912903,
116
+ "eval_runtime": 22.513,
117
+ "eval_samples_per_second": 35.579,
118
+ "eval_steps_per_second": 1.155,
119
+ "learning_rate": 0.001,
120
+ "step": 608
121
+ },
122
+ {
123
+ "epoch": 9.0,
124
+ "eval_explained_variance": 0.3135569095611572,
125
+ "eval_kl_divergence": 0.7270792126655579,
126
+ "eval_loss": 0.4551210105419159,
127
+ "eval_mae": 0.13063107430934906,
128
+ "eval_rmse": 0.17461702227592468,
129
+ "eval_runtime": 22.4949,
130
+ "eval_samples_per_second": 35.608,
131
+ "eval_steps_per_second": 1.156,
132
+ "learning_rate": 0.001,
133
+ "step": 684
134
+ },
135
+ {
136
+ "epoch": 10.0,
137
+ "eval_explained_variance": 0.29183268547058105,
138
+ "eval_kl_divergence": 0.6882225275039673,
139
+ "eval_loss": 0.45817646384239197,
140
+ "eval_mae": 0.13160471618175507,
141
+ "eval_rmse": 0.17741131782531738,
142
+ "eval_runtime": 22.508,
143
+ "eval_samples_per_second": 35.587,
144
+ "eval_steps_per_second": 1.155,
145
+ "learning_rate": 0.001,
146
+ "step": 760
147
+ },
148
+ {
149
+ "epoch": 11.0,
150
+ "eval_explained_variance": 0.25859755277633667,
151
+ "eval_kl_divergence": 0.371459424495697,
152
+ "eval_loss": 0.46834859251976013,
153
+ "eval_mae": 0.13717466592788696,
154
+ "eval_rmse": 0.18415462970733643,
155
+ "eval_runtime": 22.4535,
156
+ "eval_samples_per_second": 35.674,
157
+ "eval_steps_per_second": 1.158,
158
+ "learning_rate": 0.001,
159
+ "step": 836
160
+ },
161
+ {
162
+ "epoch": 12.0,
163
+ "eval_explained_variance": 0.308597594499588,
164
+ "eval_kl_divergence": 0.5270651578903198,
165
+ "eval_loss": 0.4578668475151062,
166
+ "eval_mae": 0.13155478239059448,
167
+ "eval_rmse": 0.17636829614639282,
168
+ "eval_runtime": 22.5868,
169
+ "eval_samples_per_second": 35.463,
170
+ "eval_steps_per_second": 1.151,
171
+ "learning_rate": 0.001,
172
+ "step": 912
173
+ },
174
+ {
175
+ "epoch": 13.0,
176
+ "eval_explained_variance": 0.3100161552429199,
177
+ "eval_kl_divergence": 0.9167731404304504,
178
+ "eval_loss": 0.4558842182159424,
179
+ "eval_mae": 0.1300584226846695,
180
+ "eval_rmse": 0.1756095588207245,
181
+ "eval_runtime": 22.7991,
182
+ "eval_samples_per_second": 35.133,
183
+ "eval_steps_per_second": 1.14,
184
+ "learning_rate": 0.001,
185
+ "step": 988
186
+ },
187
+ {
188
+ "epoch": 13.157894736842104,
189
+ "grad_norm": 0.26468560099601746,
190
+ "learning_rate": 0.001,
191
+ "loss": 0.4448,
192
+ "step": 1000
193
+ },
194
+ {
195
+ "epoch": 14.0,
196
+ "eval_explained_variance": 0.3228832185268402,
197
+ "eval_kl_divergence": 0.8826888799667358,
198
+ "eval_loss": 0.4555540680885315,
199
+ "eval_mae": 0.12918463349342346,
200
+ "eval_rmse": 0.17491525411605835,
201
+ "eval_runtime": 22.7093,
202
+ "eval_samples_per_second": 35.272,
203
+ "eval_steps_per_second": 1.145,
204
+ "learning_rate": 0.001,
205
+ "step": 1064
206
+ },
207
+ {
208
+ "epoch": 15.0,
209
+ "eval_explained_variance": 0.34160685539245605,
210
+ "eval_kl_divergence": 0.7008672952651978,
211
+ "eval_loss": 0.45217740535736084,
212
+ "eval_mae": 0.1262015700340271,
213
+ "eval_rmse": 0.17165420949459076,
214
+ "eval_runtime": 22.6104,
215
+ "eval_samples_per_second": 35.426,
216
+ "eval_steps_per_second": 1.15,
217
+ "learning_rate": 0.001,
218
+ "step": 1140
219
+ },
220
+ {
221
+ "epoch": 16.0,
222
+ "eval_explained_variance": 0.31633102893829346,
223
+ "eval_kl_divergence": 1.0038130283355713,
224
+ "eval_loss": 0.45556434988975525,
225
+ "eval_mae": 0.12863175570964813,
226
+ "eval_rmse": 0.1752910166978836,
227
+ "eval_runtime": 22.5805,
228
+ "eval_samples_per_second": 35.473,
229
+ "eval_steps_per_second": 1.151,
230
+ "learning_rate": 0.001,
231
+ "step": 1216
232
+ },
233
+ {
234
+ "epoch": 17.0,
235
+ "eval_explained_variance": 0.3205307126045227,
236
+ "eval_kl_divergence": 0.2600082457065582,
237
+ "eval_loss": 0.458648681640625,
238
+ "eval_mae": 0.13426683843135834,
239
+ "eval_rmse": 0.17750133574008942,
240
+ "eval_runtime": 22.3997,
241
+ "eval_samples_per_second": 35.759,
242
+ "eval_steps_per_second": 1.161,
243
+ "learning_rate": 0.001,
244
+ "step": 1292
245
+ },
246
+ {
247
+ "epoch": 18.0,
248
+ "eval_explained_variance": -4.778772354125977,
249
+ "eval_kl_divergence": 2.054769277572632,
250
+ "eval_loss": 0.567169725894928,
251
+ "eval_mae": 0.16376179456710815,
252
+ "eval_rmse": 0.23688165843486786,
253
+ "eval_runtime": 22.2481,
254
+ "eval_samples_per_second": 36.003,
255
+ "eval_steps_per_second": 1.169,
256
+ "learning_rate": 0.001,
257
+ "step": 1368
258
+ },
259
+ {
260
+ "epoch": 19.0,
261
+ "eval_explained_variance": 0.32792216539382935,
262
+ "eval_kl_divergence": 0.7114961743354797,
263
+ "eval_loss": 0.45287612080574036,
264
+ "eval_mae": 0.12865176796913147,
265
+ "eval_rmse": 0.17274516820907593,
266
+ "eval_runtime": 22.3304,
267
+ "eval_samples_per_second": 35.87,
268
+ "eval_steps_per_second": 1.164,
269
+ "learning_rate": 0.001,
270
+ "step": 1444
271
+ },
272
+ {
273
+ "epoch": 19.736842105263158,
274
+ "grad_norm": 0.1475011706352234,
275
+ "learning_rate": 0.001,
276
+ "loss": 0.4406,
277
+ "step": 1500
278
+ },
279
+ {
280
+ "epoch": 20.0,
281
+ "eval_explained_variance": 0.3204135596752167,
282
+ "eval_kl_divergence": 0.9694227576255798,
283
+ "eval_loss": 0.45518893003463745,
284
+ "eval_mae": 0.12852200865745544,
285
+ "eval_rmse": 0.17462262511253357,
286
+ "eval_runtime": 22.5552,
287
+ "eval_samples_per_second": 35.513,
288
+ "eval_steps_per_second": 1.153,
289
+ "learning_rate": 0.001,
290
+ "step": 1520
291
+ },
292
+ {
293
+ "epoch": 21.0,
294
+ "eval_explained_variance": 0.32996666431427,
295
+ "eval_kl_divergence": 0.778915524482727,
296
+ "eval_loss": 0.45299893617630005,
297
+ "eval_mae": 0.12820784747600555,
298
+ "eval_rmse": 0.17243456840515137,
299
+ "eval_runtime": 22.5861,
300
+ "eval_samples_per_second": 35.464,
301
+ "eval_steps_per_second": 1.151,
302
+ "learning_rate": 0.001,
303
+ "step": 1596
304
+ },
305
+ {
306
+ "epoch": 22.0,
307
+ "eval_explained_variance": 0.34726351499557495,
308
+ "eval_kl_divergence": 0.7368760704994202,
309
+ "eval_loss": 0.4502638280391693,
310
+ "eval_mae": 0.12613575160503387,
311
+ "eval_rmse": 0.17001575231552124,
312
+ "eval_runtime": 22.7271,
313
+ "eval_samples_per_second": 35.244,
314
+ "eval_steps_per_second": 1.144,
315
+ "learning_rate": 0.0001,
316
+ "step": 1672
317
+ },
318
+ {
319
+ "epoch": 23.0,
320
+ "eval_explained_variance": 0.34029728174209595,
321
+ "eval_kl_divergence": 0.5027008056640625,
322
+ "eval_loss": 0.453466534614563,
323
+ "eval_mae": 0.12802833318710327,
324
+ "eval_rmse": 0.17160943150520325,
325
+ "eval_runtime": 22.4563,
326
+ "eval_samples_per_second": 35.669,
327
+ "eval_steps_per_second": 1.158,
328
+ "learning_rate": 0.0001,
329
+ "step": 1748
330
+ },
331
+ {
332
+ "epoch": 24.0,
333
+ "eval_explained_variance": 0.3511368930339813,
334
+ "eval_kl_divergence": 0.5968054533004761,
335
+ "eval_loss": 0.4502425491809845,
336
+ "eval_mae": 0.12641073763370514,
337
+ "eval_rmse": 0.16971111297607422,
338
+ "eval_runtime": 22.592,
339
+ "eval_samples_per_second": 35.455,
340
+ "eval_steps_per_second": 1.151,
341
+ "learning_rate": 0.0001,
342
+ "step": 1824
343
+ },
344
+ {
345
+ "epoch": 25.0,
346
+ "eval_explained_variance": 0.3504308760166168,
347
+ "eval_kl_divergence": 0.621475100517273,
348
+ "eval_loss": 0.45040303468704224,
349
+ "eval_mae": 0.12673497200012207,
350
+ "eval_rmse": 0.1699284017086029,
351
+ "eval_runtime": 22.3202,
352
+ "eval_samples_per_second": 35.887,
353
+ "eval_steps_per_second": 1.165,
354
+ "learning_rate": 0.0001,
355
+ "step": 1900
356
+ },
357
+ {
358
+ "epoch": 26.0,
359
+ "eval_explained_variance": 0.34598857164382935,
360
+ "eval_kl_divergence": 0.6567814350128174,
361
+ "eval_loss": 0.4509589374065399,
362
+ "eval_mae": 0.12596669793128967,
363
+ "eval_rmse": 0.17043226957321167,
364
+ "eval_runtime": 22.4313,
365
+ "eval_samples_per_second": 35.709,
366
+ "eval_steps_per_second": 1.159,
367
+ "learning_rate": 0.0001,
368
+ "step": 1976
369
+ },
370
+ {
371
+ "epoch": 26.31578947368421,
372
+ "grad_norm": 0.16591614484786987,
373
+ "learning_rate": 0.0001,
374
+ "loss": 0.4334,
375
+ "step": 2000
376
+ },
377
+ {
378
+ "epoch": 27.0,
379
+ "eval_explained_variance": 0.35463404655456543,
380
+ "eval_kl_divergence": 0.5748001337051392,
381
+ "eval_loss": 0.4497845768928528,
382
+ "eval_mae": 0.1262420266866684,
383
+ "eval_rmse": 0.1693224012851715,
384
+ "eval_runtime": 22.3409,
385
+ "eval_samples_per_second": 35.854,
386
+ "eval_steps_per_second": 1.164,
387
+ "learning_rate": 0.0001,
388
+ "step": 2052
389
+ },
390
+ {
391
+ "epoch": 28.0,
392
+ "eval_explained_variance": 0.34665071964263916,
393
+ "eval_kl_divergence": 0.7001035809516907,
394
+ "eval_loss": 0.45060041546821594,
395
+ "eval_mae": 0.12559720873832703,
396
+ "eval_rmse": 0.17011338472366333,
397
+ "eval_runtime": 22.4894,
398
+ "eval_samples_per_second": 35.617,
399
+ "eval_steps_per_second": 1.156,
400
+ "learning_rate": 0.0001,
401
+ "step": 2128
402
+ },
403
+ {
404
+ "epoch": 29.0,
405
+ "eval_explained_variance": 0.3531297743320465,
406
+ "eval_kl_divergence": 0.5840001702308655,
407
+ "eval_loss": 0.4504892826080322,
408
+ "eval_mae": 0.12626922130584717,
409
+ "eval_rmse": 0.16992022097110748,
410
+ "eval_runtime": 22.4286,
411
+ "eval_samples_per_second": 35.713,
412
+ "eval_steps_per_second": 1.159,
413
+ "learning_rate": 0.0001,
414
+ "step": 2204
415
+ },
416
+ {
417
+ "epoch": 30.0,
418
+ "eval_explained_variance": 0.34863966703414917,
419
+ "eval_kl_divergence": 0.8101097345352173,
420
+ "eval_loss": 0.45060065388679504,
421
+ "eval_mae": 0.12516793608665466,
422
+ "eval_rmse": 0.1702672839164734,
423
+ "eval_runtime": 22.3007,
424
+ "eval_samples_per_second": 35.918,
425
+ "eval_steps_per_second": 1.166,
426
+ "learning_rate": 0.0001,
427
+ "step": 2280
428
+ },
429
+ {
430
+ "epoch": 31.0,
431
+ "eval_explained_variance": 0.3488965630531311,
432
+ "eval_kl_divergence": 0.7415657043457031,
433
+ "eval_loss": 0.45080825686454773,
434
+ "eval_mae": 0.12486829608678818,
435
+ "eval_rmse": 0.1701475977897644,
436
+ "eval_runtime": 22.3895,
437
+ "eval_samples_per_second": 35.776,
438
+ "eval_steps_per_second": 1.161,
439
+ "learning_rate": 0.0001,
440
+ "step": 2356
441
+ },
442
+ {
443
+ "epoch": 32.0,
444
+ "eval_explained_variance": 0.3523526191711426,
445
+ "eval_kl_divergence": 0.6401851177215576,
446
+ "eval_loss": 0.4501984417438507,
447
+ "eval_mae": 0.12540514767169952,
448
+ "eval_rmse": 0.16971096396446228,
449
+ "eval_runtime": 22.3438,
450
+ "eval_samples_per_second": 35.849,
451
+ "eval_steps_per_second": 1.164,
452
+ "learning_rate": 0.0001,
453
+ "step": 2432
454
+ },
455
+ {
456
+ "epoch": 32.89473684210526,
457
+ "grad_norm": 0.1912919282913208,
458
+ "learning_rate": 0.0001,
459
+ "loss": 0.4289,
460
+ "step": 2500
461
+ },
462
+ {
463
+ "epoch": 33.0,
464
+ "eval_explained_variance": 0.33917075395584106,
465
+ "eval_kl_divergence": 0.8411455154418945,
466
+ "eval_loss": 0.4510658085346222,
467
+ "eval_mae": 0.12500226497650146,
468
+ "eval_rmse": 0.1709534227848053,
469
+ "eval_runtime": 22.56,
470
+ "eval_samples_per_second": 35.505,
471
+ "eval_steps_per_second": 1.152,
472
+ "learning_rate": 0.0001,
473
+ "step": 2508
474
+ },
475
+ {
476
+ "epoch": 34.0,
477
+ "eval_explained_variance": 0.33831754326820374,
478
+ "eval_kl_divergence": 0.7203648686408997,
479
+ "eval_loss": 0.45148056745529175,
480
+ "eval_mae": 0.12593072652816772,
481
+ "eval_rmse": 0.17108403146266937,
482
+ "eval_runtime": 22.505,
483
+ "eval_samples_per_second": 35.592,
484
+ "eval_steps_per_second": 1.155,
485
+ "learning_rate": 1e-05,
486
+ "step": 2584
487
+ },
488
+ {
489
+ "epoch": 35.0,
490
+ "eval_explained_variance": 0.34977057576179504,
491
+ "eval_kl_divergence": 0.7354820966720581,
492
+ "eval_loss": 0.4502483904361725,
493
+ "eval_mae": 0.12473371624946594,
494
+ "eval_rmse": 0.16982755064964294,
495
+ "eval_runtime": 22.3912,
496
+ "eval_samples_per_second": 35.773,
497
+ "eval_steps_per_second": 1.161,
498
+ "learning_rate": 1e-05,
499
+ "step": 2660
500
+ },
501
+ {
502
+ "epoch": 36.0,
503
+ "eval_explained_variance": 0.3486325144767761,
504
+ "eval_kl_divergence": 0.49899470806121826,
505
+ "eval_loss": 0.4508889615535736,
506
+ "eval_mae": 0.1260843575000763,
507
+ "eval_rmse": 0.1702878773212433,
508
+ "eval_runtime": 22.8537,
509
+ "eval_samples_per_second": 35.049,
510
+ "eval_steps_per_second": 1.138,
511
+ "learning_rate": 1e-05,
512
+ "step": 2736
513
+ },
514
+ {
515
+ "epoch": 37.0,
516
+ "eval_explained_variance": 0.3535779118537903,
517
+ "eval_kl_divergence": 0.5451197624206543,
518
+ "eval_loss": 0.44998663663864136,
519
+ "eval_mae": 0.12602195143699646,
520
+ "eval_rmse": 0.16962522268295288,
521
+ "eval_runtime": 22.2216,
522
+ "eval_samples_per_second": 36.046,
523
+ "eval_steps_per_second": 1.17,
524
+ "learning_rate": 1e-05,
525
+ "step": 2812
526
+ },
527
+ {
528
+ "epoch": 37.0,
529
+ "learning_rate": 1e-05,
530
+ "step": 2812,
531
+ "total_flos": 1.318896404308369e+20,
532
+ "train_loss": 0.4464974437295797,
533
+ "train_runtime": 4164.1679,
534
+ "train_samples_per_second": 86.596,
535
+ "train_steps_per_second": 2.738
536
+ }
537
+ ],
538
+ "logging_steps": 500,
539
+ "max_steps": 11400,
540
+ "num_input_tokens_seen": 0,
541
+ "num_train_epochs": 150,
542
+ "save_steps": 500,
543
+ "stateful_callbacks": {
544
+ "EarlyStoppingCallback": {
545
+ "args": {
546
+ "early_stopping_patience": 10,
547
+ "early_stopping_threshold": 0.0
548
+ },
549
+ "attributes": {
550
+ "early_stopping_patience_counter": 10
551
+ }
552
+ },
553
+ "TrainerControl": {
554
+ "args": {
555
+ "should_epoch_stop": false,
556
+ "should_evaluate": false,
557
+ "should_log": false,
558
+ "should_save": true,
559
+ "should_training_stop": true
560
+ },
561
+ "attributes": {}
562
+ }
563
+ },
564
+ "total_flos": 1.318896404308369e+20,
565
+ "train_batch_size": 32,
566
+ "trial_name": null,
567
+ "trial_params": null
568
+ }