lombardata commited on
Commit
6259946
1 Parent(s): 040bb60

Evaluation on the test set completed on 2024_11_15.

Browse files
README.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: facebook/dinov2-large
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: drone-DinoVdeau-from-probs-large-2024_11_15-batch-size32_freeze_probs
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # drone-DinoVdeau-from-probs-large-2024_11_15-batch-size32_freeze_probs
15
+
16
+ This model is a fine-tuned version of [facebook/dinov2-large](https://huggingface.co/facebook/dinov2-large) on the None dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.4668
19
+ - Rmse: 0.1546
20
+ - Mae: 0.1143
21
+ - Kl Divergence: 0.3931
22
+ - Explained Variance: 0.4690
23
+ - Learning Rate: 0.0000
24
+
25
+ ## Model description
26
+
27
+ More information needed
28
+
29
+ ## Intended uses & limitations
30
+
31
+ More information needed
32
+
33
+ ## Training and evaluation data
34
+
35
+ More information needed
36
+
37
+ ## Training procedure
38
+
39
+ ### Training hyperparameters
40
+
41
+ The following hyperparameters were used during training:
42
+ - learning_rate: 0.001
43
+ - train_batch_size: 32
44
+ - eval_batch_size: 32
45
+ - seed: 42
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: linear
48
+ - num_epochs: 150
49
+ - mixed_precision_training: Native AMP
50
+
51
+ ### Training results
52
+
53
+ | Training Loss | Epoch | Step | Validation Loss | Rmse | Mae | Kl Divergence | Explained Variance | Rate |
54
+ |:-------------:|:-----:|:-----:|:---------------:|:------:|:------:|:-------------:|:------------------:|:------:|
55
+ | No log | 1.0 | 219 | 0.4855 | 0.1771 | 0.1364 | 0.3101 | 0.3433 | 0.001 |
56
+ | No log | 2.0 | 438 | 0.4760 | 0.1688 | 0.1247 | 0.5077 | 0.3891 | 0.001 |
57
+ | 0.5195 | 3.0 | 657 | 0.4777 | 0.1707 | 0.1230 | 0.7896 | 0.3848 | 0.001 |
58
+ | 0.5195 | 4.0 | 876 | 0.4743 | 0.1672 | 0.1238 | 0.4932 | 0.4037 | 0.001 |
59
+ | 0.4742 | 5.0 | 1095 | 0.4746 | 0.1669 | 0.1277 | 0.2901 | 0.4132 | 0.001 |
60
+ | 0.4742 | 6.0 | 1314 | 0.4750 | 0.1674 | 0.1253 | 0.4399 | 0.4022 | 0.001 |
61
+ | 0.4706 | 7.0 | 1533 | 0.4745 | 0.1671 | 0.1259 | 0.4868 | 0.4020 | 0.001 |
62
+ | 0.4706 | 8.0 | 1752 | 0.4742 | 0.1672 | 0.1257 | 0.3241 | 0.4111 | 0.001 |
63
+ | 0.4706 | 9.0 | 1971 | 0.4730 | 0.1658 | 0.1236 | 0.4560 | 0.4107 | 0.001 |
64
+ | 0.4678 | 10.0 | 2190 | 0.4751 | 0.1679 | 0.1269 | 0.2141 | 0.4190 | 0.001 |
65
+ | 0.4678 | 11.0 | 2409 | 0.4733 | 0.1663 | 0.1265 | 0.2530 | 0.4189 | 0.001 |
66
+ | 0.4674 | 12.0 | 2628 | 0.4758 | 0.1684 | 0.1264 | 0.3966 | 0.4074 | 0.001 |
67
+ | 0.4674 | 13.0 | 2847 | 0.4722 | 0.1650 | 0.1223 | 0.6055 | 0.4142 | 0.001 |
68
+ | 0.4676 | 14.0 | 3066 | 0.4747 | 0.1666 | 0.1250 | 0.4203 | 0.4071 | 0.001 |
69
+ | 0.4676 | 15.0 | 3285 | 0.4733 | 0.1662 | 0.1227 | 0.6553 | 0.4153 | 0.001 |
70
+ | 0.4663 | 16.0 | 3504 | 0.4735 | 0.1656 | 0.1241 | 0.3576 | 0.4176 | 0.001 |
71
+ | 0.4663 | 17.0 | 3723 | 0.4722 | 0.1643 | 0.1221 | 0.4545 | 0.4231 | 0.001 |
72
+ | 0.4663 | 18.0 | 3942 | 0.4724 | 0.1647 | 0.1225 | 0.4902 | 0.4209 | 0.001 |
73
+ | 0.4655 | 19.0 | 4161 | 0.4729 | 0.1650 | 0.1261 | 0.3158 | 0.4224 | 0.001 |
74
+ | 0.4655 | 20.0 | 4380 | 0.4697 | 0.1623 | 0.1203 | 0.4574 | 0.4342 | 0.0001 |
75
+ | 0.4635 | 21.0 | 4599 | 0.4689 | 0.1613 | 0.1197 | 0.4569 | 0.4383 | 0.0001 |
76
+ | 0.4635 | 22.0 | 4818 | 0.4691 | 0.1617 | 0.1202 | 0.4535 | 0.4374 | 0.0001 |
77
+ | 0.4615 | 23.0 | 5037 | 0.4691 | 0.1614 | 0.1210 | 0.2971 | 0.4442 | 0.0001 |
78
+ | 0.4615 | 24.0 | 5256 | 0.4692 | 0.1616 | 0.1196 | 0.3916 | 0.4406 | 0.0001 |
79
+ | 0.4615 | 25.0 | 5475 | 0.4677 | 0.1601 | 0.1181 | 0.4516 | 0.4465 | 0.0001 |
80
+ | 0.4601 | 26.0 | 5694 | 0.4680 | 0.1605 | 0.1171 | 0.6089 | 0.4434 | 0.0001 |
81
+ | 0.4601 | 27.0 | 5913 | 0.4675 | 0.1600 | 0.1182 | 0.4741 | 0.4461 | 0.0001 |
82
+ | 0.4585 | 28.0 | 6132 | 0.4681 | 0.1606 | 0.1200 | 0.3356 | 0.4489 | 0.0001 |
83
+ | 0.4585 | 29.0 | 6351 | 0.4678 | 0.1603 | 0.1181 | 0.4330 | 0.4460 | 0.0001 |
84
+ | 0.4578 | 30.0 | 6570 | 0.4680 | 0.1602 | 0.1194 | 0.3160 | 0.4504 | 0.0001 |
85
+ | 0.4578 | 31.0 | 6789 | 0.4677 | 0.1600 | 0.1179 | 0.4190 | 0.4468 | 0.0001 |
86
+ | 0.4579 | 32.0 | 7008 | 0.4675 | 0.1598 | 0.1188 | 0.3706 | 0.4504 | 0.0001 |
87
+ | 0.4579 | 33.0 | 7227 | 0.4671 | 0.1593 | 0.1181 | 0.3504 | 0.4546 | 0.0001 |
88
+ | 0.4579 | 34.0 | 7446 | 0.4670 | 0.1594 | 0.1180 | 0.3881 | 0.4533 | 0.0001 |
89
+ | 0.4569 | 35.0 | 7665 | 0.4663 | 0.1587 | 0.1166 | 0.4398 | 0.4556 | 0.0001 |
90
+ | 0.4569 | 36.0 | 7884 | 0.4666 | 0.1587 | 0.1170 | 0.4382 | 0.4544 | 0.0001 |
91
+ | 0.4572 | 37.0 | 8103 | 0.4658 | 0.1581 | 0.1163 | 0.4330 | 0.4594 | 0.0001 |
92
+ | 0.4572 | 38.0 | 8322 | 0.4659 | 0.1583 | 0.1162 | 0.4878 | 0.4567 | 0.0001 |
93
+ | 0.4572 | 39.0 | 8541 | 0.4670 | 0.1595 | 0.1178 | 0.3791 | 0.4552 | 0.0001 |
94
+ | 0.4572 | 40.0 | 8760 | 0.4665 | 0.1588 | 0.1178 | 0.3889 | 0.4568 | 0.0001 |
95
+ | 0.4572 | 41.0 | 8979 | 0.4666 | 0.1589 | 0.1184 | 0.3222 | 0.4591 | 0.0001 |
96
+ | 0.4559 | 42.0 | 9198 | 0.4655 | 0.1579 | 0.1164 | 0.4262 | 0.4607 | 0.0001 |
97
+ | 0.4559 | 43.0 | 9417 | 0.4656 | 0.1579 | 0.1162 | 0.4611 | 0.4603 | 0.0001 |
98
+ | 0.4554 | 44.0 | 9636 | 0.4656 | 0.1580 | 0.1164 | 0.4586 | 0.4616 | 0.0001 |
99
+ | 0.4554 | 45.0 | 9855 | 0.4660 | 0.1583 | 0.1158 | 0.4368 | 0.4597 | 0.0001 |
100
+ | 0.4557 | 46.0 | 10074 | 0.4660 | 0.1582 | 0.1164 | 0.4118 | 0.4604 | 0.0001 |
101
+ | 0.4557 | 47.0 | 10293 | 0.4652 | 0.1577 | 0.1154 | 0.5424 | 0.4614 | 0.0001 |
102
+ | 0.4551 | 48.0 | 10512 | 0.4660 | 0.1586 | 0.1160 | 0.5251 | 0.4596 | 0.0001 |
103
+ | 0.4551 | 49.0 | 10731 | 0.4660 | 0.1585 | 0.1161 | 0.5007 | 0.4572 | 0.0001 |
104
+ | 0.4551 | 50.0 | 10950 | 0.4666 | 0.1586 | 0.1185 | 0.2424 | 0.4659 | 0.0001 |
105
+ | 0.4545 | 51.0 | 11169 | 0.4661 | 0.1584 | 0.1162 | 0.4171 | 0.4589 | 0.0001 |
106
+ | 0.4545 | 52.0 | 11388 | 0.4650 | 0.1575 | 0.1155 | 0.4912 | 0.4630 | 0.0001 |
107
+ | 0.4548 | 53.0 | 11607 | 0.4654 | 0.1578 | 0.1169 | 0.4030 | 0.4644 | 0.0001 |
108
+ | 0.4548 | 54.0 | 11826 | 0.4661 | 0.1585 | 0.1153 | 0.4811 | 0.4595 | 0.0001 |
109
+ | 0.455 | 55.0 | 12045 | 0.4653 | 0.1576 | 0.1167 | 0.3774 | 0.4638 | 0.0001 |
110
+ | 0.455 | 56.0 | 12264 | 0.4654 | 0.1575 | 0.1176 | 0.3254 | 0.4670 | 0.0001 |
111
+ | 0.455 | 57.0 | 12483 | 0.4654 | 0.1575 | 0.1162 | 0.3649 | 0.4662 | 0.0001 |
112
+ | 0.4531 | 58.0 | 12702 | 0.4665 | 0.1584 | 0.1166 | 0.4075 | 0.4607 | 0.0001 |
113
+ | 0.4531 | 59.0 | 12921 | 0.4652 | 0.1575 | 0.1157 | 0.4202 | 0.4654 | 1e-05 |
114
+ | 0.4538 | 60.0 | 13140 | 0.4653 | 0.1571 | 0.1157 | 0.4084 | 0.4669 | 1e-05 |
115
+ | 0.4538 | 61.0 | 13359 | 0.4654 | 0.1573 | 0.1153 | 0.4497 | 0.4661 | 1e-05 |
116
+ | 0.4529 | 62.0 | 13578 | 0.4648 | 0.1568 | 0.1153 | 0.4112 | 0.4682 | 1e-05 |
117
+ | 0.4529 | 63.0 | 13797 | 0.4648 | 0.1567 | 0.1152 | 0.3748 | 0.4702 | 1e-05 |
118
+ | 0.4527 | 64.0 | 14016 | 0.4652 | 0.1571 | 0.1162 | 0.3044 | 0.4721 | 1e-05 |
119
+ | 0.4527 | 65.0 | 14235 | 0.4648 | 0.1569 | 0.1153 | 0.4685 | 0.4670 | 1e-05 |
120
+ | 0.4527 | 66.0 | 14454 | 0.4650 | 0.1573 | 0.1148 | 0.5087 | 0.4671 | 1e-05 |
121
+ | 0.4531 | 67.0 | 14673 | 0.4646 | 0.1568 | 0.1155 | 0.4274 | 0.4690 | 1e-05 |
122
+ | 0.4531 | 68.0 | 14892 | 0.4646 | 0.1566 | 0.1144 | 0.4969 | 0.4680 | 1e-05 |
123
+ | 0.452 | 69.0 | 15111 | 0.4644 | 0.1564 | 0.1145 | 0.4480 | 0.4696 | 1e-05 |
124
+ | 0.452 | 70.0 | 15330 | 0.4648 | 0.1567 | 0.1150 | 0.4291 | 0.4692 | 1e-05 |
125
+ | 0.4524 | 71.0 | 15549 | 0.4645 | 0.1565 | 0.1156 | 0.3797 | 0.4711 | 1e-05 |
126
+ | 0.4524 | 72.0 | 15768 | 0.4647 | 0.1569 | 0.1150 | 0.4280 | 0.4690 | 1e-05 |
127
+ | 0.4524 | 73.0 | 15987 | 0.4641 | 0.1563 | 0.1142 | 0.4592 | 0.4707 | 1e-05 |
128
+ | 0.4515 | 74.0 | 16206 | 0.4642 | 0.1564 | 0.1151 | 0.4321 | 0.4706 | 1e-05 |
129
+ | 0.4515 | 75.0 | 16425 | 0.4645 | 0.1565 | 0.1152 | 0.3843 | 0.4708 | 1e-05 |
130
+ | 0.4521 | 76.0 | 16644 | 0.4646 | 0.1569 | 0.1147 | 0.5216 | 0.4675 | 1e-05 |
131
+ | 0.4521 | 77.0 | 16863 | 0.4648 | 0.1569 | 0.1152 | 0.4094 | 0.4691 | 1e-05 |
132
+ | 0.4519 | 78.0 | 17082 | 0.4643 | 0.1564 | 0.1149 | 0.4399 | 0.4709 | 1e-05 |
133
+ | 0.4519 | 79.0 | 17301 | 0.4646 | 0.1567 | 0.1147 | 0.4178 | 0.4697 | 1e-05 |
134
+ | 0.4517 | 80.0 | 17520 | 0.4644 | 0.1564 | 0.1150 | 0.4373 | 0.4700 | 0.0000 |
135
+ | 0.4517 | 81.0 | 17739 | 0.4645 | 0.1567 | 0.1151 | 0.4701 | 0.4688 | 0.0000 |
136
+ | 0.4517 | 82.0 | 17958 | 0.4644 | 0.1565 | 0.1146 | 0.4601 | 0.4703 | 0.0000 |
137
+ | 0.4514 | 83.0 | 18177 | 0.4646 | 0.1567 | 0.1147 | 0.4511 | 0.4684 | 0.0000 |
138
+
139
+
140
+ ### Framework versions
141
+
142
+ - Transformers 4.41.0
143
+ - Pytorch 2.5.0+cu124
144
+ - Datasets 3.0.2
145
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 83.0,
3
+ "eval_explained_variance": 0.4690297544002533,
4
+ "eval_kl_divergence": 0.39311662316322327,
5
+ "eval_loss": 0.46676135063171387,
6
+ "eval_mae": 0.11432621628046036,
7
+ "eval_rmse": 0.1546144038438797,
8
+ "eval_runtime": 61.4305,
9
+ "eval_samples_per_second": 38.369,
10
+ "eval_steps_per_second": 1.205,
11
+ "learning_rate": 1.0000000000000002e-06,
12
+ "total_flos": 8.603009036605255e+19,
13
+ "train_loss": 0.45949580130708517,
14
+ "train_runtime": 19431.3015,
15
+ "train_samples_per_second": 54.06,
16
+ "train_steps_per_second": 1.691
17
+ }
logs/events.out.tfevents.1731645940.datavisu2 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a9b63af80d9e97d7a262878d2da6ff7210b0e820ecf256b9a9ac279fed901c34
3
- size 58429
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82cb032cea5b9414fdccda28f1b8e649b0e7dae9bdb952de73917a1e064c806b
3
+ size 59401
logs/events.out.tfevents.1731665544.datavisu2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a0515028d84acb2fc61e70ee5780613b9ccbfec4c83830626dd5e3847fc4eecb
3
+ size 40
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:752358c0f20c99c258046bc099a7ccfe36da98a71a50eb3a73378b501cd46774
3
  size 1222956704
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45fc50bef10dc323eaf2356023658e6a6f37ae07d855a6afd234c5255a0ff422
3
  size 1222956704
test_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 83.0,
3
+ "eval_explained_variance": 0.4690297544002533,
4
+ "eval_kl_divergence": 0.39311662316322327,
5
+ "eval_loss": 0.46676135063171387,
6
+ "eval_mae": 0.11432621628046036,
7
+ "eval_rmse": 0.1546144038438797,
8
+ "eval_runtime": 61.4305,
9
+ "eval_samples_per_second": 38.369,
10
+ "eval_steps_per_second": 1.205,
11
+ "learning_rate": 1.0000000000000002e-06
12
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 83.0,
3
+ "learning_rate": 1.0000000000000002e-06,
4
+ "total_flos": 8.603009036605255e+19,
5
+ "train_loss": 0.45949580130708517,
6
+ "train_runtime": 19431.3015,
7
+ "train_samples_per_second": 54.06,
8
+ "train_steps_per_second": 1.691
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1383 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.46414923667907715,
3
+ "best_model_checkpoint": "/home/datawork-iot-nos/Seatizen/models/multilabel/drone/drone-DinoVdeau-from-probs-large-2024_11_15-batch-size32_freeze_probs/checkpoint-15987",
4
+ "epoch": 83.0,
5
+ "eval_steps": 500,
6
+ "global_step": 18177,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_explained_variance": 0.3432542085647583,
14
+ "eval_kl_divergence": 0.31011611223220825,
15
+ "eval_loss": 0.4855400025844574,
16
+ "eval_mae": 0.1364378184080124,
17
+ "eval_rmse": 0.17712123692035675,
18
+ "eval_runtime": 55.3387,
19
+ "eval_samples_per_second": 42.538,
20
+ "eval_steps_per_second": 1.337,
21
+ "learning_rate": 0.001,
22
+ "step": 219
23
+ },
24
+ {
25
+ "epoch": 2.0,
26
+ "eval_explained_variance": 0.38912513852119446,
27
+ "eval_kl_divergence": 0.5077245235443115,
28
+ "eval_loss": 0.47601452469825745,
29
+ "eval_mae": 0.12465938925743103,
30
+ "eval_rmse": 0.16875195503234863,
31
+ "eval_runtime": 54.843,
32
+ "eval_samples_per_second": 42.923,
33
+ "eval_steps_per_second": 1.349,
34
+ "learning_rate": 0.001,
35
+ "step": 438
36
+ },
37
+ {
38
+ "epoch": 2.2831050228310503,
39
+ "grad_norm": 0.35450002551078796,
40
+ "learning_rate": 0.001,
41
+ "loss": 0.5195,
42
+ "step": 500
43
+ },
44
+ {
45
+ "epoch": 3.0,
46
+ "eval_explained_variance": 0.3848476707935333,
47
+ "eval_kl_divergence": 0.7895973324775696,
48
+ "eval_loss": 0.4776814579963684,
49
+ "eval_mae": 0.12300346046686172,
50
+ "eval_rmse": 0.17065072059631348,
51
+ "eval_runtime": 56.2195,
52
+ "eval_samples_per_second": 41.872,
53
+ "eval_steps_per_second": 1.316,
54
+ "learning_rate": 0.001,
55
+ "step": 657
56
+ },
57
+ {
58
+ "epoch": 4.0,
59
+ "eval_explained_variance": 0.403704434633255,
60
+ "eval_kl_divergence": 0.49319207668304443,
61
+ "eval_loss": 0.47429159283638,
62
+ "eval_mae": 0.12376764416694641,
63
+ "eval_rmse": 0.1672389954328537,
64
+ "eval_runtime": 54.7793,
65
+ "eval_samples_per_second": 42.972,
66
+ "eval_steps_per_second": 1.351,
67
+ "learning_rate": 0.001,
68
+ "step": 876
69
+ },
70
+ {
71
+ "epoch": 4.566210045662101,
72
+ "grad_norm": 0.2313629388809204,
73
+ "learning_rate": 0.001,
74
+ "loss": 0.4742,
75
+ "step": 1000
76
+ },
77
+ {
78
+ "epoch": 5.0,
79
+ "eval_explained_variance": 0.41316938400268555,
80
+ "eval_kl_divergence": 0.2900688648223877,
81
+ "eval_loss": 0.47457176446914673,
82
+ "eval_mae": 0.12771284580230713,
83
+ "eval_rmse": 0.16687722504138947,
84
+ "eval_runtime": 55.1273,
85
+ "eval_samples_per_second": 42.701,
86
+ "eval_steps_per_second": 1.342,
87
+ "learning_rate": 0.001,
88
+ "step": 1095
89
+ },
90
+ {
91
+ "epoch": 6.0,
92
+ "eval_explained_variance": 0.40222811698913574,
93
+ "eval_kl_divergence": 0.43988940119743347,
94
+ "eval_loss": 0.4749792814254761,
95
+ "eval_mae": 0.1252531260251999,
96
+ "eval_rmse": 0.16735166311264038,
97
+ "eval_runtime": 53.136,
98
+ "eval_samples_per_second": 44.301,
99
+ "eval_steps_per_second": 1.393,
100
+ "learning_rate": 0.001,
101
+ "step": 1314
102
+ },
103
+ {
104
+ "epoch": 6.8493150684931505,
105
+ "grad_norm": 0.18959695100784302,
106
+ "learning_rate": 0.001,
107
+ "loss": 0.4706,
108
+ "step": 1500
109
+ },
110
+ {
111
+ "epoch": 7.0,
112
+ "eval_explained_variance": 0.4019981324672699,
113
+ "eval_kl_divergence": 0.48684099316596985,
114
+ "eval_loss": 0.4744807779788971,
115
+ "eval_mae": 0.12594138085842133,
116
+ "eval_rmse": 0.16705705225467682,
117
+ "eval_runtime": 53.367,
118
+ "eval_samples_per_second": 44.11,
119
+ "eval_steps_per_second": 1.387,
120
+ "learning_rate": 0.001,
121
+ "step": 1533
122
+ },
123
+ {
124
+ "epoch": 8.0,
125
+ "eval_explained_variance": 0.41111621260643005,
126
+ "eval_kl_divergence": 0.324148029088974,
127
+ "eval_loss": 0.47424906492233276,
128
+ "eval_mae": 0.12568950653076172,
129
+ "eval_rmse": 0.16722555458545685,
130
+ "eval_runtime": 55.5084,
131
+ "eval_samples_per_second": 42.408,
132
+ "eval_steps_per_second": 1.333,
133
+ "learning_rate": 0.001,
134
+ "step": 1752
135
+ },
136
+ {
137
+ "epoch": 9.0,
138
+ "eval_explained_variance": 0.4107116162776947,
139
+ "eval_kl_divergence": 0.4560392200946808,
140
+ "eval_loss": 0.4729686379432678,
141
+ "eval_mae": 0.12355945259332657,
142
+ "eval_rmse": 0.16584673523902893,
143
+ "eval_runtime": 55.1596,
144
+ "eval_samples_per_second": 42.676,
145
+ "eval_steps_per_second": 1.342,
146
+ "learning_rate": 0.001,
147
+ "step": 1971
148
+ },
149
+ {
150
+ "epoch": 9.132420091324201,
151
+ "grad_norm": 0.18577350676059723,
152
+ "learning_rate": 0.001,
153
+ "loss": 0.4678,
154
+ "step": 2000
155
+ },
156
+ {
157
+ "epoch": 10.0,
158
+ "eval_explained_variance": 0.4190339744091034,
159
+ "eval_kl_divergence": 0.2140849530696869,
160
+ "eval_loss": 0.4750550389289856,
161
+ "eval_mae": 0.12685616314411163,
162
+ "eval_rmse": 0.1679263859987259,
163
+ "eval_runtime": 56.0284,
164
+ "eval_samples_per_second": 42.014,
165
+ "eval_steps_per_second": 1.321,
166
+ "learning_rate": 0.001,
167
+ "step": 2190
168
+ },
169
+ {
170
+ "epoch": 11.0,
171
+ "eval_explained_variance": 0.41887199878692627,
172
+ "eval_kl_divergence": 0.2529982030391693,
173
+ "eval_loss": 0.4733181595802307,
174
+ "eval_mae": 0.12647458910942078,
175
+ "eval_rmse": 0.16627688705921173,
176
+ "eval_runtime": 55.5532,
177
+ "eval_samples_per_second": 42.374,
178
+ "eval_steps_per_second": 1.332,
179
+ "learning_rate": 0.001,
180
+ "step": 2409
181
+ },
182
+ {
183
+ "epoch": 11.415525114155251,
184
+ "grad_norm": 0.14618106186389923,
185
+ "learning_rate": 0.001,
186
+ "loss": 0.4674,
187
+ "step": 2500
188
+ },
189
+ {
190
+ "epoch": 12.0,
191
+ "eval_explained_variance": 0.4073503315448761,
192
+ "eval_kl_divergence": 0.3965540826320648,
193
+ "eval_loss": 0.4758349061012268,
194
+ "eval_mae": 0.1263781040906906,
195
+ "eval_rmse": 0.1683548092842102,
196
+ "eval_runtime": 53.8367,
197
+ "eval_samples_per_second": 43.725,
198
+ "eval_steps_per_second": 1.375,
199
+ "learning_rate": 0.001,
200
+ "step": 2628
201
+ },
202
+ {
203
+ "epoch": 13.0,
204
+ "eval_explained_variance": 0.41419240832328796,
205
+ "eval_kl_divergence": 0.6054547429084778,
206
+ "eval_loss": 0.4722050428390503,
207
+ "eval_mae": 0.12233959883451462,
208
+ "eval_rmse": 0.16495703160762787,
209
+ "eval_runtime": 54.7322,
210
+ "eval_samples_per_second": 43.009,
211
+ "eval_steps_per_second": 1.352,
212
+ "learning_rate": 0.001,
213
+ "step": 2847
214
+ },
215
+ {
216
+ "epoch": 13.698630136986301,
217
+ "grad_norm": 0.15461835265159607,
218
+ "learning_rate": 0.001,
219
+ "loss": 0.4676,
220
+ "step": 3000
221
+ },
222
+ {
223
+ "epoch": 14.0,
224
+ "eval_explained_variance": 0.40708938241004944,
225
+ "eval_kl_divergence": 0.4203389585018158,
226
+ "eval_loss": 0.4747372567653656,
227
+ "eval_mae": 0.12501581013202667,
228
+ "eval_rmse": 0.16655980050563812,
229
+ "eval_runtime": 55.2289,
230
+ "eval_samples_per_second": 42.623,
231
+ "eval_steps_per_second": 1.34,
232
+ "learning_rate": 0.001,
233
+ "step": 3066
234
+ },
235
+ {
236
+ "epoch": 15.0,
237
+ "eval_explained_variance": 0.41527059674263,
238
+ "eval_kl_divergence": 0.6553499102592468,
239
+ "eval_loss": 0.47325292229652405,
240
+ "eval_mae": 0.12266030162572861,
241
+ "eval_rmse": 0.16621644794940948,
242
+ "eval_runtime": 54.2502,
243
+ "eval_samples_per_second": 43.392,
244
+ "eval_steps_per_second": 1.364,
245
+ "learning_rate": 0.001,
246
+ "step": 3285
247
+ },
248
+ {
249
+ "epoch": 15.981735159817351,
250
+ "grad_norm": 0.10063416510820389,
251
+ "learning_rate": 0.001,
252
+ "loss": 0.4663,
253
+ "step": 3500
254
+ },
255
+ {
256
+ "epoch": 16.0,
257
+ "eval_explained_variance": 0.4175969660282135,
258
+ "eval_kl_divergence": 0.35757607221603394,
259
+ "eval_loss": 0.4734710156917572,
260
+ "eval_mae": 0.12411689758300781,
261
+ "eval_rmse": 0.16558559238910675,
262
+ "eval_runtime": 53.6921,
263
+ "eval_samples_per_second": 43.843,
264
+ "eval_steps_per_second": 1.378,
265
+ "learning_rate": 0.001,
266
+ "step": 3504
267
+ },
268
+ {
269
+ "epoch": 17.0,
270
+ "eval_explained_variance": 0.4231180250644684,
271
+ "eval_kl_divergence": 0.4545155465602875,
272
+ "eval_loss": 0.4721581041812897,
273
+ "eval_mae": 0.12205825001001358,
274
+ "eval_rmse": 0.16431300342082977,
275
+ "eval_runtime": 54.0719,
276
+ "eval_samples_per_second": 43.535,
277
+ "eval_steps_per_second": 1.369,
278
+ "learning_rate": 0.001,
279
+ "step": 3723
280
+ },
281
+ {
282
+ "epoch": 18.0,
283
+ "eval_explained_variance": 0.42092254757881165,
284
+ "eval_kl_divergence": 0.49019381403923035,
285
+ "eval_loss": 0.4723944365978241,
286
+ "eval_mae": 0.12245010584592819,
287
+ "eval_rmse": 0.16473934054374695,
288
+ "eval_runtime": 53.2446,
289
+ "eval_samples_per_second": 44.211,
290
+ "eval_steps_per_second": 1.39,
291
+ "learning_rate": 0.001,
292
+ "step": 3942
293
+ },
294
+ {
295
+ "epoch": 18.264840182648403,
296
+ "grad_norm": 0.11052733659744263,
297
+ "learning_rate": 0.001,
298
+ "loss": 0.4655,
299
+ "step": 4000
300
+ },
301
+ {
302
+ "epoch": 19.0,
303
+ "eval_explained_variance": 0.42237523198127747,
304
+ "eval_kl_divergence": 0.3157788813114166,
305
+ "eval_loss": 0.47289156913757324,
306
+ "eval_mae": 0.12610264122486115,
307
+ "eval_rmse": 0.164999321103096,
308
+ "eval_runtime": 54.353,
309
+ "eval_samples_per_second": 43.309,
310
+ "eval_steps_per_second": 1.361,
311
+ "learning_rate": 0.001,
312
+ "step": 4161
313
+ },
314
+ {
315
+ "epoch": 20.0,
316
+ "eval_explained_variance": 0.43422555923461914,
317
+ "eval_kl_divergence": 0.45738106966018677,
318
+ "eval_loss": 0.4697262644767761,
319
+ "eval_mae": 0.12028751522302628,
320
+ "eval_rmse": 0.16227416694164276,
321
+ "eval_runtime": 52.1033,
322
+ "eval_samples_per_second": 45.179,
323
+ "eval_steps_per_second": 1.42,
324
+ "learning_rate": 0.0001,
325
+ "step": 4380
326
+ },
327
+ {
328
+ "epoch": 20.54794520547945,
329
+ "grad_norm": 0.10903308540582657,
330
+ "learning_rate": 0.0001,
331
+ "loss": 0.4635,
332
+ "step": 4500
333
+ },
334
+ {
335
+ "epoch": 21.0,
336
+ "eval_explained_variance": 0.43825283646583557,
337
+ "eval_kl_divergence": 0.45688703656196594,
338
+ "eval_loss": 0.46890661120414734,
339
+ "eval_mae": 0.11968808621168137,
340
+ "eval_rmse": 0.16127373278141022,
341
+ "eval_runtime": 52.3325,
342
+ "eval_samples_per_second": 44.982,
343
+ "eval_steps_per_second": 1.414,
344
+ "learning_rate": 0.0001,
345
+ "step": 4599
346
+ },
347
+ {
348
+ "epoch": 22.0,
349
+ "eval_explained_variance": 0.4373685419559479,
350
+ "eval_kl_divergence": 0.45346954464912415,
351
+ "eval_loss": 0.46905258297920227,
352
+ "eval_mae": 0.12017489224672318,
353
+ "eval_rmse": 0.16165030002593994,
354
+ "eval_runtime": 51.1815,
355
+ "eval_samples_per_second": 45.993,
356
+ "eval_steps_per_second": 1.446,
357
+ "learning_rate": 0.0001,
358
+ "step": 4818
359
+ },
360
+ {
361
+ "epoch": 22.831050228310502,
362
+ "grad_norm": 0.09725002944469452,
363
+ "learning_rate": 0.0001,
364
+ "loss": 0.4615,
365
+ "step": 5000
366
+ },
367
+ {
368
+ "epoch": 23.0,
369
+ "eval_explained_variance": 0.4442131519317627,
370
+ "eval_kl_divergence": 0.2970678508281708,
371
+ "eval_loss": 0.4691086411476135,
372
+ "eval_mae": 0.1210075318813324,
373
+ "eval_rmse": 0.1613779515028,
374
+ "eval_runtime": 50.785,
375
+ "eval_samples_per_second": 46.352,
376
+ "eval_steps_per_second": 1.457,
377
+ "learning_rate": 0.0001,
378
+ "step": 5037
379
+ },
380
+ {
381
+ "epoch": 24.0,
382
+ "eval_explained_variance": 0.4405536353588104,
383
+ "eval_kl_divergence": 0.39161574840545654,
384
+ "eval_loss": 0.46915334463119507,
385
+ "eval_mae": 0.11959254741668701,
386
+ "eval_rmse": 0.16161170601844788,
387
+ "eval_runtime": 50.8712,
388
+ "eval_samples_per_second": 46.274,
389
+ "eval_steps_per_second": 1.455,
390
+ "learning_rate": 0.0001,
391
+ "step": 5256
392
+ },
393
+ {
394
+ "epoch": 25.0,
395
+ "eval_explained_variance": 0.4465361535549164,
396
+ "eval_kl_divergence": 0.4515945613384247,
397
+ "eval_loss": 0.4676876664161682,
398
+ "eval_mae": 0.11813607066869736,
399
+ "eval_rmse": 0.16005758941173553,
400
+ "eval_runtime": 50.537,
401
+ "eval_samples_per_second": 46.58,
402
+ "eval_steps_per_second": 1.464,
403
+ "learning_rate": 0.0001,
404
+ "step": 5475
405
+ },
406
+ {
407
+ "epoch": 25.114155251141554,
408
+ "grad_norm": 0.10921537131071091,
409
+ "learning_rate": 0.0001,
410
+ "loss": 0.4601,
411
+ "step": 5500
412
+ },
413
+ {
414
+ "epoch": 26.0,
415
+ "eval_explained_variance": 0.4434172809123993,
416
+ "eval_kl_divergence": 0.6089490652084351,
417
+ "eval_loss": 0.4679708480834961,
418
+ "eval_mae": 0.11711684614419937,
419
+ "eval_rmse": 0.1605486422777176,
420
+ "eval_runtime": 49.8832,
421
+ "eval_samples_per_second": 47.19,
422
+ "eval_steps_per_second": 1.483,
423
+ "learning_rate": 0.0001,
424
+ "step": 5694
425
+ },
426
+ {
427
+ "epoch": 27.0,
428
+ "eval_explained_variance": 0.4460805654525757,
429
+ "eval_kl_divergence": 0.4741028845310211,
430
+ "eval_loss": 0.4674595892429352,
431
+ "eval_mae": 0.11824781447649002,
432
+ "eval_rmse": 0.16004686057567596,
433
+ "eval_runtime": 49.7793,
434
+ "eval_samples_per_second": 47.289,
435
+ "eval_steps_per_second": 1.487,
436
+ "learning_rate": 0.0001,
437
+ "step": 5913
438
+ },
439
+ {
440
+ "epoch": 27.397260273972602,
441
+ "grad_norm": 0.11422494053840637,
442
+ "learning_rate": 0.0001,
443
+ "loss": 0.4585,
444
+ "step": 6000
445
+ },
446
+ {
447
+ "epoch": 28.0,
448
+ "eval_explained_variance": 0.4489245116710663,
449
+ "eval_kl_divergence": 0.3355759084224701,
450
+ "eval_loss": 0.46810340881347656,
451
+ "eval_mae": 0.11996418237686157,
452
+ "eval_rmse": 0.16060088574886322,
453
+ "eval_runtime": 52.9491,
454
+ "eval_samples_per_second": 44.458,
455
+ "eval_steps_per_second": 1.398,
456
+ "learning_rate": 0.0001,
457
+ "step": 6132
458
+ },
459
+ {
460
+ "epoch": 29.0,
461
+ "eval_explained_variance": 0.4459850490093231,
462
+ "eval_kl_divergence": 0.43302619457244873,
463
+ "eval_loss": 0.4678303897380829,
464
+ "eval_mae": 0.11808297038078308,
465
+ "eval_rmse": 0.16026519238948822,
466
+ "eval_runtime": 50.5506,
467
+ "eval_samples_per_second": 46.567,
468
+ "eval_steps_per_second": 1.464,
469
+ "learning_rate": 0.0001,
470
+ "step": 6351
471
+ },
472
+ {
473
+ "epoch": 29.680365296803654,
474
+ "grad_norm": 0.11833047866821289,
475
+ "learning_rate": 0.0001,
476
+ "loss": 0.4578,
477
+ "step": 6500
478
+ },
479
+ {
480
+ "epoch": 30.0,
481
+ "eval_explained_variance": 0.4503695070743561,
482
+ "eval_kl_divergence": 0.3159695267677307,
483
+ "eval_loss": 0.46800243854522705,
484
+ "eval_mae": 0.11937135457992554,
485
+ "eval_rmse": 0.160204216837883,
486
+ "eval_runtime": 50.0689,
487
+ "eval_samples_per_second": 47.015,
488
+ "eval_steps_per_second": 1.478,
489
+ "learning_rate": 0.0001,
490
+ "step": 6570
491
+ },
492
+ {
493
+ "epoch": 31.0,
494
+ "eval_explained_variance": 0.4467611014842987,
495
+ "eval_kl_divergence": 0.419010728597641,
496
+ "eval_loss": 0.4676785469055176,
497
+ "eval_mae": 0.11789224296808243,
498
+ "eval_rmse": 0.1599912792444229,
499
+ "eval_runtime": 50.2573,
500
+ "eval_samples_per_second": 46.839,
501
+ "eval_steps_per_second": 1.472,
502
+ "learning_rate": 0.0001,
503
+ "step": 6789
504
+ },
505
+ {
506
+ "epoch": 31.963470319634702,
507
+ "grad_norm": 0.1234586164355278,
508
+ "learning_rate": 0.0001,
509
+ "loss": 0.4579,
510
+ "step": 7000
511
+ },
512
+ {
513
+ "epoch": 32.0,
514
+ "eval_explained_variance": 0.4503757953643799,
515
+ "eval_kl_divergence": 0.3705631494522095,
516
+ "eval_loss": 0.46752873063087463,
517
+ "eval_mae": 0.11878199130296707,
518
+ "eval_rmse": 0.159804567694664,
519
+ "eval_runtime": 50.3085,
520
+ "eval_samples_per_second": 46.791,
521
+ "eval_steps_per_second": 1.471,
522
+ "learning_rate": 0.0001,
523
+ "step": 7008
524
+ },
525
+ {
526
+ "epoch": 33.0,
527
+ "eval_explained_variance": 0.4545632600784302,
528
+ "eval_kl_divergence": 0.35043853521347046,
529
+ "eval_loss": 0.46710190176963806,
530
+ "eval_mae": 0.1181415393948555,
531
+ "eval_rmse": 0.1593446284532547,
532
+ "eval_runtime": 50.4199,
533
+ "eval_samples_per_second": 46.688,
534
+ "eval_steps_per_second": 1.468,
535
+ "learning_rate": 0.0001,
536
+ "step": 7227
537
+ },
538
+ {
539
+ "epoch": 34.0,
540
+ "eval_explained_variance": 0.4532606303691864,
541
+ "eval_kl_divergence": 0.3881392180919647,
542
+ "eval_loss": 0.4670344293117523,
543
+ "eval_mae": 0.11804797500371933,
544
+ "eval_rmse": 0.15942266583442688,
545
+ "eval_runtime": 50.088,
546
+ "eval_samples_per_second": 46.997,
547
+ "eval_steps_per_second": 1.477,
548
+ "learning_rate": 0.0001,
549
+ "step": 7446
550
+ },
551
+ {
552
+ "epoch": 34.24657534246575,
553
+ "grad_norm": 0.14323526620864868,
554
+ "learning_rate": 0.0001,
555
+ "loss": 0.4569,
556
+ "step": 7500
557
+ },
558
+ {
559
+ "epoch": 35.0,
560
+ "eval_explained_variance": 0.4555685818195343,
561
+ "eval_kl_divergence": 0.43976902961730957,
562
+ "eval_loss": 0.4662601053714752,
563
+ "eval_mae": 0.11664538830518723,
564
+ "eval_rmse": 0.1586536318063736,
565
+ "eval_runtime": 49.8708,
566
+ "eval_samples_per_second": 47.202,
567
+ "eval_steps_per_second": 1.484,
568
+ "learning_rate": 0.0001,
569
+ "step": 7665
570
+ },
571
+ {
572
+ "epoch": 36.0,
573
+ "eval_explained_variance": 0.4544428884983063,
574
+ "eval_kl_divergence": 0.4382496476173401,
575
+ "eval_loss": 0.46657058596611023,
576
+ "eval_mae": 0.11700741201639175,
577
+ "eval_rmse": 0.15874631702899933,
578
+ "eval_runtime": 49.7975,
579
+ "eval_samples_per_second": 47.271,
580
+ "eval_steps_per_second": 1.486,
581
+ "learning_rate": 0.0001,
582
+ "step": 7884
583
+ },
584
+ {
585
+ "epoch": 36.529680365296805,
586
+ "grad_norm": 0.17629703879356384,
587
+ "learning_rate": 0.0001,
588
+ "loss": 0.4572,
589
+ "step": 8000
590
+ },
591
+ {
592
+ "epoch": 37.0,
593
+ "eval_explained_variance": 0.45941635966300964,
594
+ "eval_kl_divergence": 0.4330490827560425,
595
+ "eval_loss": 0.4657588005065918,
596
+ "eval_mae": 0.11633748561143875,
597
+ "eval_rmse": 0.15810036659240723,
598
+ "eval_runtime": 51.4251,
599
+ "eval_samples_per_second": 45.775,
600
+ "eval_steps_per_second": 1.439,
601
+ "learning_rate": 0.0001,
602
+ "step": 8103
603
+ },
604
+ {
605
+ "epoch": 38.0,
606
+ "eval_explained_variance": 0.4566784203052521,
607
+ "eval_kl_divergence": 0.4877949357032776,
608
+ "eval_loss": 0.4659184217453003,
609
+ "eval_mae": 0.11623784899711609,
610
+ "eval_rmse": 0.15832678973674774,
611
+ "eval_runtime": 49.7333,
612
+ "eval_samples_per_second": 47.332,
613
+ "eval_steps_per_second": 1.488,
614
+ "learning_rate": 0.0001,
615
+ "step": 8322
616
+ },
617
+ {
618
+ "epoch": 38.81278538812786,
619
+ "grad_norm": 0.1781003624200821,
620
+ "learning_rate": 0.0001,
621
+ "loss": 0.4572,
622
+ "step": 8500
623
+ },
624
+ {
625
+ "epoch": 39.0,
626
+ "eval_explained_variance": 0.45519956946372986,
627
+ "eval_kl_divergence": 0.3790707290172577,
628
+ "eval_loss": 0.46703553199768066,
629
+ "eval_mae": 0.11782807856798172,
630
+ "eval_rmse": 0.15946339070796967,
631
+ "eval_runtime": 52.4,
632
+ "eval_samples_per_second": 44.924,
633
+ "eval_steps_per_second": 1.412,
634
+ "learning_rate": 0.0001,
635
+ "step": 8541
636
+ },
637
+ {
638
+ "epoch": 40.0,
639
+ "eval_explained_variance": 0.45683178305625916,
640
+ "eval_kl_divergence": 0.38892972469329834,
641
+ "eval_loss": 0.4664987027645111,
642
+ "eval_mae": 0.11783644556999207,
643
+ "eval_rmse": 0.15876977145671844,
644
+ "eval_runtime": 50.7398,
645
+ "eval_samples_per_second": 46.394,
646
+ "eval_steps_per_second": 1.458,
647
+ "learning_rate": 0.0001,
648
+ "step": 8760
649
+ },
650
+ {
651
+ "epoch": 41.0,
652
+ "eval_explained_variance": 0.4591364860534668,
653
+ "eval_kl_divergence": 0.3222128450870514,
654
+ "eval_loss": 0.46659526228904724,
655
+ "eval_mae": 0.11838778108358383,
656
+ "eval_rmse": 0.15888933837413788,
657
+ "eval_runtime": 50.0159,
658
+ "eval_samples_per_second": 47.065,
659
+ "eval_steps_per_second": 1.48,
660
+ "learning_rate": 0.0001,
661
+ "step": 8979
662
+ },
663
+ {
664
+ "epoch": 41.0958904109589,
665
+ "grad_norm": 0.13085126876831055,
666
+ "learning_rate": 0.0001,
667
+ "loss": 0.4559,
668
+ "step": 9000
669
+ },
670
+ {
671
+ "epoch": 42.0,
672
+ "eval_explained_variance": 0.4606964886188507,
673
+ "eval_kl_divergence": 0.426244854927063,
674
+ "eval_loss": 0.4655005633831024,
675
+ "eval_mae": 0.11635158210992813,
676
+ "eval_rmse": 0.15787668526172638,
677
+ "eval_runtime": 49.9099,
678
+ "eval_samples_per_second": 47.165,
679
+ "eval_steps_per_second": 1.483,
680
+ "learning_rate": 0.0001,
681
+ "step": 9198
682
+ },
683
+ {
684
+ "epoch": 43.0,
685
+ "eval_explained_variance": 0.46034756302833557,
686
+ "eval_kl_divergence": 0.4611224830150604,
687
+ "eval_loss": 0.4656265676021576,
688
+ "eval_mae": 0.11616652458906174,
689
+ "eval_rmse": 0.1579464077949524,
690
+ "eval_runtime": 50.0123,
691
+ "eval_samples_per_second": 47.068,
692
+ "eval_steps_per_second": 1.48,
693
+ "learning_rate": 0.0001,
694
+ "step": 9417
695
+ },
696
+ {
697
+ "epoch": 43.37899543378995,
698
+ "grad_norm": 0.17523790895938873,
699
+ "learning_rate": 0.0001,
700
+ "loss": 0.4554,
701
+ "step": 9500
702
+ },
703
+ {
704
+ "epoch": 44.0,
705
+ "eval_explained_variance": 0.4616149961948395,
706
+ "eval_kl_divergence": 0.45858410000801086,
707
+ "eval_loss": 0.4655725955963135,
708
+ "eval_mae": 0.11644264310598373,
709
+ "eval_rmse": 0.15800905227661133,
710
+ "eval_runtime": 50.6284,
711
+ "eval_samples_per_second": 46.496,
712
+ "eval_steps_per_second": 1.462,
713
+ "learning_rate": 0.0001,
714
+ "step": 9636
715
+ },
716
+ {
717
+ "epoch": 45.0,
718
+ "eval_explained_variance": 0.45969870686531067,
719
+ "eval_kl_divergence": 0.4367772340774536,
720
+ "eval_loss": 0.46600833535194397,
721
+ "eval_mae": 0.11579249054193497,
722
+ "eval_rmse": 0.15833592414855957,
723
+ "eval_runtime": 50.629,
724
+ "eval_samples_per_second": 46.495,
725
+ "eval_steps_per_second": 1.462,
726
+ "learning_rate": 0.0001,
727
+ "step": 9855
728
+ },
729
+ {
730
+ "epoch": 45.662100456621005,
731
+ "grad_norm": 0.1231347844004631,
732
+ "learning_rate": 0.0001,
733
+ "loss": 0.4557,
734
+ "step": 10000
735
+ },
736
+ {
737
+ "epoch": 46.0,
738
+ "eval_explained_variance": 0.4603704512119293,
739
+ "eval_kl_divergence": 0.41175922751426697,
740
+ "eval_loss": 0.4660418927669525,
741
+ "eval_mae": 0.11639311909675598,
742
+ "eval_rmse": 0.1581837385892868,
743
+ "eval_runtime": 50.1537,
744
+ "eval_samples_per_second": 46.936,
745
+ "eval_steps_per_second": 1.475,
746
+ "learning_rate": 0.0001,
747
+ "step": 10074
748
+ },
749
+ {
750
+ "epoch": 47.0,
751
+ "eval_explained_variance": 0.4613979756832123,
752
+ "eval_kl_divergence": 0.5424114465713501,
753
+ "eval_loss": 0.46521857380867004,
754
+ "eval_mae": 0.11542114615440369,
755
+ "eval_rmse": 0.15771377086639404,
756
+ "eval_runtime": 49.6928,
757
+ "eval_samples_per_second": 47.371,
758
+ "eval_steps_per_second": 1.489,
759
+ "learning_rate": 0.0001,
760
+ "step": 10293
761
+ },
762
+ {
763
+ "epoch": 47.945205479452056,
764
+ "grad_norm": 0.46352267265319824,
765
+ "learning_rate": 0.0001,
766
+ "loss": 0.4551,
767
+ "step": 10500
768
+ },
769
+ {
770
+ "epoch": 48.0,
771
+ "eval_explained_variance": 0.45960724353790283,
772
+ "eval_kl_divergence": 0.525124728679657,
773
+ "eval_loss": 0.46598610281944275,
774
+ "eval_mae": 0.1159835234284401,
775
+ "eval_rmse": 0.15856431424617767,
776
+ "eval_runtime": 49.9974,
777
+ "eval_samples_per_second": 47.082,
778
+ "eval_steps_per_second": 1.48,
779
+ "learning_rate": 0.0001,
780
+ "step": 10512
781
+ },
782
+ {
783
+ "epoch": 49.0,
784
+ "eval_explained_variance": 0.4572352468967438,
785
+ "eval_kl_divergence": 0.5006867051124573,
786
+ "eval_loss": 0.46604350209236145,
787
+ "eval_mae": 0.11609696596860886,
788
+ "eval_rmse": 0.15853044390678406,
789
+ "eval_runtime": 50.2446,
790
+ "eval_samples_per_second": 46.851,
791
+ "eval_steps_per_second": 1.473,
792
+ "learning_rate": 0.0001,
793
+ "step": 10731
794
+ },
795
+ {
796
+ "epoch": 50.0,
797
+ "eval_explained_variance": 0.4658548831939697,
798
+ "eval_kl_divergence": 0.24239596724510193,
799
+ "eval_loss": 0.46660009026527405,
800
+ "eval_mae": 0.11854288727045059,
801
+ "eval_rmse": 0.15863054990768433,
802
+ "eval_runtime": 50.1897,
803
+ "eval_samples_per_second": 46.902,
804
+ "eval_steps_per_second": 1.474,
805
+ "learning_rate": 0.0001,
806
+ "step": 10950
807
+ },
808
+ {
809
+ "epoch": 50.22831050228311,
810
+ "grad_norm": 0.1688494235277176,
811
+ "learning_rate": 0.0001,
812
+ "loss": 0.4545,
813
+ "step": 11000
814
+ },
815
+ {
816
+ "epoch": 51.0,
817
+ "eval_explained_variance": 0.45888975262641907,
818
+ "eval_kl_divergence": 0.4170607030391693,
819
+ "eval_loss": 0.4660661220550537,
820
+ "eval_mae": 0.11618483066558838,
821
+ "eval_rmse": 0.15835459530353546,
822
+ "eval_runtime": 49.5535,
823
+ "eval_samples_per_second": 47.504,
824
+ "eval_steps_per_second": 1.493,
825
+ "learning_rate": 0.0001,
826
+ "step": 11169
827
+ },
828
+ {
829
+ "epoch": 52.0,
830
+ "eval_explained_variance": 0.46297597885131836,
831
+ "eval_kl_divergence": 0.49118655920028687,
832
+ "eval_loss": 0.4649689793586731,
833
+ "eval_mae": 0.11549883335828781,
834
+ "eval_rmse": 0.1575259119272232,
835
+ "eval_runtime": 50.3774,
836
+ "eval_samples_per_second": 46.727,
837
+ "eval_steps_per_second": 1.469,
838
+ "learning_rate": 0.0001,
839
+ "step": 11388
840
+ },
841
+ {
842
+ "epoch": 52.51141552511415,
843
+ "grad_norm": 0.2805333137512207,
844
+ "learning_rate": 0.0001,
845
+ "loss": 0.4548,
846
+ "step": 11500
847
+ },
848
+ {
849
+ "epoch": 53.0,
850
+ "eval_explained_variance": 0.46440085768699646,
851
+ "eval_kl_divergence": 0.4030352830886841,
852
+ "eval_loss": 0.4653578996658325,
853
+ "eval_mae": 0.11687562614679337,
854
+ "eval_rmse": 0.15780305862426758,
855
+ "eval_runtime": 51.1877,
856
+ "eval_samples_per_second": 45.988,
857
+ "eval_steps_per_second": 1.446,
858
+ "learning_rate": 0.0001,
859
+ "step": 11607
860
+ },
861
+ {
862
+ "epoch": 54.0,
863
+ "eval_explained_variance": 0.4594965875148773,
864
+ "eval_kl_divergence": 0.4810858964920044,
865
+ "eval_loss": 0.4660585820674896,
866
+ "eval_mae": 0.11529505252838135,
867
+ "eval_rmse": 0.15853293240070343,
868
+ "eval_runtime": 51.2952,
869
+ "eval_samples_per_second": 45.891,
870
+ "eval_steps_per_second": 1.443,
871
+ "learning_rate": 0.0001,
872
+ "step": 11826
873
+ },
874
+ {
875
+ "epoch": 54.794520547945204,
876
+ "grad_norm": 0.22778521478176117,
877
+ "learning_rate": 0.0001,
878
+ "loss": 0.455,
879
+ "step": 12000
880
+ },
881
+ {
882
+ "epoch": 55.0,
883
+ "eval_explained_variance": 0.46380600333213806,
884
+ "eval_kl_divergence": 0.3773800730705261,
885
+ "eval_loss": 0.46527624130249023,
886
+ "eval_mae": 0.11668615788221359,
887
+ "eval_rmse": 0.1576414853334427,
888
+ "eval_runtime": 50.6825,
889
+ "eval_samples_per_second": 46.446,
890
+ "eval_steps_per_second": 1.46,
891
+ "learning_rate": 0.0001,
892
+ "step": 12045
893
+ },
894
+ {
895
+ "epoch": 56.0,
896
+ "eval_explained_variance": 0.4669934809207916,
897
+ "eval_kl_divergence": 0.32541513442993164,
898
+ "eval_loss": 0.4654240906238556,
899
+ "eval_mae": 0.11757931858301163,
900
+ "eval_rmse": 0.1575363427400589,
901
+ "eval_runtime": 50.538,
902
+ "eval_samples_per_second": 46.579,
903
+ "eval_steps_per_second": 1.464,
904
+ "learning_rate": 0.0001,
905
+ "step": 12264
906
+ },
907
+ {
908
+ "epoch": 57.0,
909
+ "eval_explained_variance": 0.4661710560321808,
910
+ "eval_kl_divergence": 0.3648814857006073,
911
+ "eval_loss": 0.4654492139816284,
912
+ "eval_mae": 0.11615876108407974,
913
+ "eval_rmse": 0.15751774609088898,
914
+ "eval_runtime": 51.1673,
915
+ "eval_samples_per_second": 46.006,
916
+ "eval_steps_per_second": 1.446,
917
+ "learning_rate": 0.0001,
918
+ "step": 12483
919
+ },
920
+ {
921
+ "epoch": 57.077625570776256,
922
+ "grad_norm": 0.16715611517429352,
923
+ "learning_rate": 0.0001,
924
+ "loss": 0.4531,
925
+ "step": 12500
926
+ },
927
+ {
928
+ "epoch": 58.0,
929
+ "eval_explained_variance": 0.4606919586658478,
930
+ "eval_kl_divergence": 0.40749335289001465,
931
+ "eval_loss": 0.46654412150382996,
932
+ "eval_mae": 0.1166309341788292,
933
+ "eval_rmse": 0.15835203230381012,
934
+ "eval_runtime": 50.603,
935
+ "eval_samples_per_second": 46.519,
936
+ "eval_steps_per_second": 1.462,
937
+ "learning_rate": 0.0001,
938
+ "step": 12702
939
+ },
940
+ {
941
+ "epoch": 59.0,
942
+ "eval_explained_variance": 0.4653950035572052,
943
+ "eval_kl_divergence": 0.42019784450531006,
944
+ "eval_loss": 0.465238481760025,
945
+ "eval_mae": 0.11570876836776733,
946
+ "eval_rmse": 0.15746039152145386,
947
+ "eval_runtime": 50.3267,
948
+ "eval_samples_per_second": 46.774,
949
+ "eval_steps_per_second": 1.47,
950
+ "learning_rate": 1e-05,
951
+ "step": 12921
952
+ },
953
+ {
954
+ "epoch": 59.36073059360731,
955
+ "grad_norm": 0.19701753556728363,
956
+ "learning_rate": 1e-05,
957
+ "loss": 0.4538,
958
+ "step": 13000
959
+ },
960
+ {
961
+ "epoch": 60.0,
962
+ "eval_explained_variance": 0.4668855369091034,
963
+ "eval_kl_divergence": 0.4084234833717346,
964
+ "eval_loss": 0.46530231833457947,
965
+ "eval_mae": 0.11569295078516006,
966
+ "eval_rmse": 0.15709955990314484,
967
+ "eval_runtime": 51.1174,
968
+ "eval_samples_per_second": 46.051,
969
+ "eval_steps_per_second": 1.448,
970
+ "learning_rate": 1e-05,
971
+ "step": 13140
972
+ },
973
+ {
974
+ "epoch": 61.0,
975
+ "eval_explained_variance": 0.4661245346069336,
976
+ "eval_kl_divergence": 0.4496937096118927,
977
+ "eval_loss": 0.4653523564338684,
978
+ "eval_mae": 0.11528477817773819,
979
+ "eval_rmse": 0.15729330480098724,
980
+ "eval_runtime": 50.8416,
981
+ "eval_samples_per_second": 46.301,
982
+ "eval_steps_per_second": 1.456,
983
+ "learning_rate": 1e-05,
984
+ "step": 13359
985
+ },
986
+ {
987
+ "epoch": 61.64383561643836,
988
+ "grad_norm": 0.1874207705259323,
989
+ "learning_rate": 1e-05,
990
+ "loss": 0.4529,
991
+ "step": 13500
992
+ },
993
+ {
994
+ "epoch": 62.0,
995
+ "eval_explained_variance": 0.4681651294231415,
996
+ "eval_kl_divergence": 0.411173015832901,
997
+ "eval_loss": 0.46477487683296204,
998
+ "eval_mae": 0.11529665440320969,
999
+ "eval_rmse": 0.15684308111667633,
1000
+ "eval_runtime": 52.6214,
1001
+ "eval_samples_per_second": 44.735,
1002
+ "eval_steps_per_second": 1.406,
1003
+ "learning_rate": 1e-05,
1004
+ "step": 13578
1005
+ },
1006
+ {
1007
+ "epoch": 63.0,
1008
+ "eval_explained_variance": 0.47016242146492004,
1009
+ "eval_kl_divergence": 0.3748082220554352,
1010
+ "eval_loss": 0.46481335163116455,
1011
+ "eval_mae": 0.11518841236829758,
1012
+ "eval_rmse": 0.15671293437480927,
1013
+ "eval_runtime": 53.2469,
1014
+ "eval_samples_per_second": 44.209,
1015
+ "eval_steps_per_second": 1.39,
1016
+ "learning_rate": 1e-05,
1017
+ "step": 13797
1018
+ },
1019
+ {
1020
+ "epoch": 63.926940639269404,
1021
+ "grad_norm": 0.22562281787395477,
1022
+ "learning_rate": 1e-05,
1023
+ "loss": 0.4527,
1024
+ "step": 14000
1025
+ },
1026
+ {
1027
+ "epoch": 64.0,
1028
+ "eval_explained_variance": 0.4721170663833618,
1029
+ "eval_kl_divergence": 0.3044198155403137,
1030
+ "eval_loss": 0.46523070335388184,
1031
+ "eval_mae": 0.11618036776781082,
1032
+ "eval_rmse": 0.15709933638572693,
1033
+ "eval_runtime": 53.3051,
1034
+ "eval_samples_per_second": 44.161,
1035
+ "eval_steps_per_second": 1.388,
1036
+ "learning_rate": 1e-05,
1037
+ "step": 14016
1038
+ },
1039
+ {
1040
+ "epoch": 65.0,
1041
+ "eval_explained_variance": 0.46695852279663086,
1042
+ "eval_kl_divergence": 0.46853822469711304,
1043
+ "eval_loss": 0.46484872698783875,
1044
+ "eval_mae": 0.11532068997621536,
1045
+ "eval_rmse": 0.1568661779165268,
1046
+ "eval_runtime": 52.7599,
1047
+ "eval_samples_per_second": 44.617,
1048
+ "eval_steps_per_second": 1.403,
1049
+ "learning_rate": 1e-05,
1050
+ "step": 14235
1051
+ },
1052
+ {
1053
+ "epoch": 66.0,
1054
+ "eval_explained_variance": 0.46712610125541687,
1055
+ "eval_kl_divergence": 0.508738100528717,
1056
+ "eval_loss": 0.46500927209854126,
1057
+ "eval_mae": 0.11475471407175064,
1058
+ "eval_rmse": 0.15729309618473053,
1059
+ "eval_runtime": 54.0149,
1060
+ "eval_samples_per_second": 43.581,
1061
+ "eval_steps_per_second": 1.37,
1062
+ "learning_rate": 1e-05,
1063
+ "step": 14454
1064
+ },
1065
+ {
1066
+ "epoch": 66.21004566210046,
1067
+ "grad_norm": 0.18448679149150848,
1068
+ "learning_rate": 1e-05,
1069
+ "loss": 0.4531,
1070
+ "step": 14500
1071
+ },
1072
+ {
1073
+ "epoch": 67.0,
1074
+ "eval_explained_variance": 0.4690088927745819,
1075
+ "eval_kl_divergence": 0.42743220925331116,
1076
+ "eval_loss": 0.4645930230617523,
1077
+ "eval_mae": 0.1155417189002037,
1078
+ "eval_rmse": 0.1567572057247162,
1079
+ "eval_runtime": 52.5655,
1080
+ "eval_samples_per_second": 44.782,
1081
+ "eval_steps_per_second": 1.408,
1082
+ "learning_rate": 1e-05,
1083
+ "step": 14673
1084
+ },
1085
+ {
1086
+ "epoch": 68.0,
1087
+ "eval_explained_variance": 0.4680323302745819,
1088
+ "eval_kl_divergence": 0.49686378240585327,
1089
+ "eval_loss": 0.46456360816955566,
1090
+ "eval_mae": 0.11437365412712097,
1091
+ "eval_rmse": 0.1566230058670044,
1092
+ "eval_runtime": 50.8799,
1093
+ "eval_samples_per_second": 46.266,
1094
+ "eval_steps_per_second": 1.454,
1095
+ "learning_rate": 1e-05,
1096
+ "step": 14892
1097
+ },
1098
+ {
1099
+ "epoch": 68.4931506849315,
1100
+ "grad_norm": 0.21752646565437317,
1101
+ "learning_rate": 1e-05,
1102
+ "loss": 0.452,
1103
+ "step": 15000
1104
+ },
1105
+ {
1106
+ "epoch": 69.0,
1107
+ "eval_explained_variance": 0.4696376323699951,
1108
+ "eval_kl_divergence": 0.44800856709480286,
1109
+ "eval_loss": 0.464430034160614,
1110
+ "eval_mae": 0.11452987045049667,
1111
+ "eval_rmse": 0.15642575919628143,
1112
+ "eval_runtime": 61.8405,
1113
+ "eval_samples_per_second": 38.066,
1114
+ "eval_steps_per_second": 1.197,
1115
+ "learning_rate": 1e-05,
1116
+ "step": 15111
1117
+ },
1118
+ {
1119
+ "epoch": 70.0,
1120
+ "eval_explained_variance": 0.4692017734050751,
1121
+ "eval_kl_divergence": 0.42908576130867004,
1122
+ "eval_loss": 0.4648461937904358,
1123
+ "eval_mae": 0.11500384658575058,
1124
+ "eval_rmse": 0.15674862265586853,
1125
+ "eval_runtime": 60.5787,
1126
+ "eval_samples_per_second": 38.859,
1127
+ "eval_steps_per_second": 1.222,
1128
+ "learning_rate": 1e-05,
1129
+ "step": 15330
1130
+ },
1131
+ {
1132
+ "epoch": 70.77625570776256,
1133
+ "grad_norm": 0.23285503685474396,
1134
+ "learning_rate": 1e-05,
1135
+ "loss": 0.4524,
1136
+ "step": 15500
1137
+ },
1138
+ {
1139
+ "epoch": 71.0,
1140
+ "eval_explained_variance": 0.4711233675479889,
1141
+ "eval_kl_divergence": 0.37966692447662354,
1142
+ "eval_loss": 0.4645022749900818,
1143
+ "eval_mae": 0.11555531620979309,
1144
+ "eval_rmse": 0.15646833181381226,
1145
+ "eval_runtime": 61.2584,
1146
+ "eval_samples_per_second": 38.427,
1147
+ "eval_steps_per_second": 1.208,
1148
+ "learning_rate": 1e-05,
1149
+ "step": 15549
1150
+ },
1151
+ {
1152
+ "epoch": 72.0,
1153
+ "eval_explained_variance": 0.4690466821193695,
1154
+ "eval_kl_divergence": 0.42796915769577026,
1155
+ "eval_loss": 0.46473589539527893,
1156
+ "eval_mae": 0.11497951298952103,
1157
+ "eval_rmse": 0.15693025290966034,
1158
+ "eval_runtime": 61.782,
1159
+ "eval_samples_per_second": 38.102,
1160
+ "eval_steps_per_second": 1.198,
1161
+ "learning_rate": 1e-05,
1162
+ "step": 15768
1163
+ },
1164
+ {
1165
+ "epoch": 73.0,
1166
+ "eval_explained_variance": 0.4707035720348358,
1167
+ "eval_kl_divergence": 0.4591566324234009,
1168
+ "eval_loss": 0.46414923667907715,
1169
+ "eval_mae": 0.11423368006944656,
1170
+ "eval_rmse": 0.15631103515625,
1171
+ "eval_runtime": 62.9115,
1172
+ "eval_samples_per_second": 37.418,
1173
+ "eval_steps_per_second": 1.176,
1174
+ "learning_rate": 1e-05,
1175
+ "step": 15987
1176
+ },
1177
+ {
1178
+ "epoch": 73.05936073059361,
1179
+ "grad_norm": 0.1904192417860031,
1180
+ "learning_rate": 1e-05,
1181
+ "loss": 0.4515,
1182
+ "step": 16000
1183
+ },
1184
+ {
1185
+ "epoch": 74.0,
1186
+ "eval_explained_variance": 0.4705829620361328,
1187
+ "eval_kl_divergence": 0.43208685517311096,
1188
+ "eval_loss": 0.4641610085964203,
1189
+ "eval_mae": 0.11505597829818726,
1190
+ "eval_rmse": 0.1563975065946579,
1191
+ "eval_runtime": 61.932,
1192
+ "eval_samples_per_second": 38.009,
1193
+ "eval_steps_per_second": 1.195,
1194
+ "learning_rate": 1e-05,
1195
+ "step": 16206
1196
+ },
1197
+ {
1198
+ "epoch": 75.0,
1199
+ "eval_explained_variance": 0.47077181935310364,
1200
+ "eval_kl_divergence": 0.3843104839324951,
1201
+ "eval_loss": 0.4644509255886078,
1202
+ "eval_mae": 0.11519055813550949,
1203
+ "eval_rmse": 0.15653057396411896,
1204
+ "eval_runtime": 62.3182,
1205
+ "eval_samples_per_second": 37.774,
1206
+ "eval_steps_per_second": 1.187,
1207
+ "learning_rate": 1e-05,
1208
+ "step": 16425
1209
+ },
1210
+ {
1211
+ "epoch": 75.34246575342466,
1212
+ "grad_norm": 0.2563965618610382,
1213
+ "learning_rate": 1e-05,
1214
+ "loss": 0.4521,
1215
+ "step": 16500
1216
+ },
1217
+ {
1218
+ "epoch": 76.0,
1219
+ "eval_explained_variance": 0.4675123989582062,
1220
+ "eval_kl_divergence": 0.5215911269187927,
1221
+ "eval_loss": 0.4646488130092621,
1222
+ "eval_mae": 0.1146780475974083,
1223
+ "eval_rmse": 0.1569206565618515,
1224
+ "eval_runtime": 66.0488,
1225
+ "eval_samples_per_second": 35.64,
1226
+ "eval_steps_per_second": 1.12,
1227
+ "learning_rate": 1e-05,
1228
+ "step": 16644
1229
+ },
1230
+ {
1231
+ "epoch": 77.0,
1232
+ "eval_explained_variance": 0.46909868717193604,
1233
+ "eval_kl_divergence": 0.4094104468822479,
1234
+ "eval_loss": 0.46475714445114136,
1235
+ "eval_mae": 0.11523856967687607,
1236
+ "eval_rmse": 0.15687990188598633,
1237
+ "eval_runtime": 62.1685,
1238
+ "eval_samples_per_second": 37.865,
1239
+ "eval_steps_per_second": 1.19,
1240
+ "learning_rate": 1e-05,
1241
+ "step": 16863
1242
+ },
1243
+ {
1244
+ "epoch": 77.62557077625571,
1245
+ "grad_norm": 0.16491472721099854,
1246
+ "learning_rate": 1e-05,
1247
+ "loss": 0.4519,
1248
+ "step": 17000
1249
+ },
1250
+ {
1251
+ "epoch": 78.0,
1252
+ "eval_explained_variance": 0.47086599469184875,
1253
+ "eval_kl_divergence": 0.43988528847694397,
1254
+ "eval_loss": 0.46428272128105164,
1255
+ "eval_mae": 0.11493176966905594,
1256
+ "eval_rmse": 0.15638257563114166,
1257
+ "eval_runtime": 61.9923,
1258
+ "eval_samples_per_second": 37.972,
1259
+ "eval_steps_per_second": 1.194,
1260
+ "learning_rate": 1e-05,
1261
+ "step": 17082
1262
+ },
1263
+ {
1264
+ "epoch": 79.0,
1265
+ "eval_explained_variance": 0.4697439670562744,
1266
+ "eval_kl_divergence": 0.4178011417388916,
1267
+ "eval_loss": 0.4645934998989105,
1268
+ "eval_mae": 0.11465150117874146,
1269
+ "eval_rmse": 0.15666015446186066,
1270
+ "eval_runtime": 63.0404,
1271
+ "eval_samples_per_second": 37.341,
1272
+ "eval_steps_per_second": 1.174,
1273
+ "learning_rate": 1e-05,
1274
+ "step": 17301
1275
+ },
1276
+ {
1277
+ "epoch": 79.90867579908675,
1278
+ "grad_norm": 0.1647184044122696,
1279
+ "learning_rate": 1.0000000000000002e-06,
1280
+ "loss": 0.4517,
1281
+ "step": 17500
1282
+ },
1283
+ {
1284
+ "epoch": 80.0,
1285
+ "eval_explained_variance": 0.4699563980102539,
1286
+ "eval_kl_divergence": 0.43727052211761475,
1287
+ "eval_loss": 0.46436014771461487,
1288
+ "eval_mae": 0.11501001566648483,
1289
+ "eval_rmse": 0.15643416345119476,
1290
+ "eval_runtime": 61.5606,
1291
+ "eval_samples_per_second": 38.239,
1292
+ "eval_steps_per_second": 1.202,
1293
+ "learning_rate": 1.0000000000000002e-06,
1294
+ "step": 17520
1295
+ },
1296
+ {
1297
+ "epoch": 81.0,
1298
+ "eval_explained_variance": 0.468768835067749,
1299
+ "eval_kl_divergence": 0.47009941935539246,
1300
+ "eval_loss": 0.46448636054992676,
1301
+ "eval_mae": 0.11508657783269882,
1302
+ "eval_rmse": 0.15673168003559113,
1303
+ "eval_runtime": 62.9178,
1304
+ "eval_samples_per_second": 37.414,
1305
+ "eval_steps_per_second": 1.176,
1306
+ "learning_rate": 1.0000000000000002e-06,
1307
+ "step": 17739
1308
+ },
1309
+ {
1310
+ "epoch": 82.0,
1311
+ "eval_explained_variance": 0.470253586769104,
1312
+ "eval_kl_divergence": 0.4601159989833832,
1313
+ "eval_loss": 0.4644375145435333,
1314
+ "eval_mae": 0.11455937474966049,
1315
+ "eval_rmse": 0.15652652084827423,
1316
+ "eval_runtime": 62.6023,
1317
+ "eval_samples_per_second": 37.602,
1318
+ "eval_steps_per_second": 1.182,
1319
+ "learning_rate": 1.0000000000000002e-06,
1320
+ "step": 17958
1321
+ },
1322
+ {
1323
+ "epoch": 82.1917808219178,
1324
+ "grad_norm": 0.2432813197374344,
1325
+ "learning_rate": 1.0000000000000002e-06,
1326
+ "loss": 0.4514,
1327
+ "step": 18000
1328
+ },
1329
+ {
1330
+ "epoch": 83.0,
1331
+ "eval_explained_variance": 0.468420147895813,
1332
+ "eval_kl_divergence": 0.4510715901851654,
1333
+ "eval_loss": 0.46457409858703613,
1334
+ "eval_mae": 0.11468392610549927,
1335
+ "eval_rmse": 0.15669189393520355,
1336
+ "eval_runtime": 62.7877,
1337
+ "eval_samples_per_second": 37.491,
1338
+ "eval_steps_per_second": 1.179,
1339
+ "learning_rate": 1.0000000000000002e-06,
1340
+ "step": 18177
1341
+ },
1342
+ {
1343
+ "epoch": 83.0,
1344
+ "learning_rate": 1.0000000000000002e-06,
1345
+ "step": 18177,
1346
+ "total_flos": 8.603009036605255e+19,
1347
+ "train_loss": 0.45949580130708517,
1348
+ "train_runtime": 19431.3015,
1349
+ "train_samples_per_second": 54.06,
1350
+ "train_steps_per_second": 1.691
1351
+ }
1352
+ ],
1353
+ "logging_steps": 500,
1354
+ "max_steps": 32850,
1355
+ "num_input_tokens_seen": 0,
1356
+ "num_train_epochs": 150,
1357
+ "save_steps": 500,
1358
+ "stateful_callbacks": {
1359
+ "EarlyStoppingCallback": {
1360
+ "args": {
1361
+ "early_stopping_patience": 10,
1362
+ "early_stopping_threshold": 0.0
1363
+ },
1364
+ "attributes": {
1365
+ "early_stopping_patience_counter": 0
1366
+ }
1367
+ },
1368
+ "TrainerControl": {
1369
+ "args": {
1370
+ "should_epoch_stop": false,
1371
+ "should_evaluate": false,
1372
+ "should_log": false,
1373
+ "should_save": true,
1374
+ "should_training_stop": true
1375
+ },
1376
+ "attributes": {}
1377
+ }
1378
+ },
1379
+ "total_flos": 8.603009036605255e+19,
1380
+ "train_batch_size": 32,
1381
+ "trial_name": null,
1382
+ "trial_params": null
1383
+ }