pszemraj commited on
Commit
b93537e
·
verified ·
1 Parent(s): 4dfcb10

Upload folder using huggingface_hub

Browse files
checkpoints/checkpoint-pt-17500/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:973c5df01a7e4998cac279fa8d88512468c87607d6d404d2ce7a7e82ae975b5f
3
+ size 3550041880
checkpoints/checkpoint-pt-17500/random_states_0.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3b2ce7e08081f282216ae6d35b3e3e08ec13a6874529efce924280e2d044c09
3
+ size 14344
checkpoints/checkpoint-pt-20000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31dc4320c0d516e13d139a4cd4bffc116827b038b463f8423675c3d60c450a38
3
+ size 3550041880
checkpoints/checkpoint-pt-20000/random_states_0.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3b2ce7e08081f282216ae6d35b3e3e08ec13a6874529efce924280e2d044c09
3
+ size 14344
checkpoints/checkpoint-pt-20001/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31dc4320c0d516e13d139a4cd4bffc116827b038b463f8423675c3d60c450a38
3
+ size 3550041880
checkpoints/checkpoint-pt-20001/random_states_0.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2add1dde78776714849e4d0f42e83551a9123fb6ef7d38a53150e7fc6c0d4124
3
+ size 14344
checkpoints/main.log CHANGED
@@ -677,3 +677,207 @@ Mixed precision type: bf16
677
  [2024-08-31 17:07:58,425][Main][INFO] - [train] Step 15500 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.378 | Lr --> 0.002 | Seconds_per_step --> 4.876 |
678
  [2024-08-31 17:10:00,358][Main][INFO] - [train] Step 15525 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.364 | Lr --> 0.002 | Seconds_per_step --> 4.877 |
679
  [2024-08-31 17:12:03,531][Main][INFO] - [train] Step 15550 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.349 | Lr --> 0.002 | Seconds_per_step --> 4.927 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
677
  [2024-08-31 17:07:58,425][Main][INFO] - [train] Step 15500 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.378 | Lr --> 0.002 | Seconds_per_step --> 4.876 |
678
  [2024-08-31 17:10:00,358][Main][INFO] - [train] Step 15525 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.364 | Lr --> 0.002 | Seconds_per_step --> 4.877 |
679
  [2024-08-31 17:12:03,531][Main][INFO] - [train] Step 15550 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.349 | Lr --> 0.002 | Seconds_per_step --> 4.927 |
680
+ [2024-08-31 17:14:05,478][Main][INFO] - [train] Step 15575 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.340 | Lr --> 0.002 | Seconds_per_step --> 4.878 |
681
+ [2024-08-31 17:16:07,219][Main][INFO] - [train] Step 15600 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.323 | Lr --> 0.002 | Seconds_per_step --> 4.870 |
682
+ [2024-08-31 17:18:10,558][Main][INFO] - [train] Step 15625 out of 20000 | Loss --> 1.772 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.311 | Lr --> 0.002 | Seconds_per_step --> 4.933 |
683
+ [2024-08-31 17:20:12,172][Main][INFO] - [train] Step 15650 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.303 | Lr --> 0.002 | Seconds_per_step --> 4.864 |
684
+ [2024-08-31 17:22:13,878][Main][INFO] - [train] Step 15675 out of 20000 | Loss --> 1.767 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.290 | Lr --> 0.002 | Seconds_per_step --> 4.868 |
685
+ [2024-08-31 17:24:17,279][Main][INFO] - [train] Step 15700 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.273 | Lr --> 0.002 | Seconds_per_step --> 4.936 |
686
+ [2024-08-31 17:26:19,126][Main][INFO] - [train] Step 15725 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.263 | Lr --> 0.002 | Seconds_per_step --> 4.874 |
687
+ [2024-08-31 17:28:20,908][Main][INFO] - [train] Step 15750 out of 20000 | Loss --> 1.782 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.246 | Lr --> 0.002 | Seconds_per_step --> 4.871 |
688
+ [2024-08-31 17:30:24,238][Main][INFO] - [train] Step 15775 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.233 | Lr --> 0.002 | Seconds_per_step --> 4.933 |
689
+ [2024-08-31 17:32:26,169][Main][INFO] - [train] Step 15800 out of 20000 | Loss --> 1.760 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.220 | Lr --> 0.002 | Seconds_per_step --> 4.877 |
690
+ [2024-08-31 17:34:28,677][Main][INFO] - [train] Step 15825 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.207 | Lr --> 0.002 | Seconds_per_step --> 4.900 |
691
+ [2024-08-31 17:36:32,092][Main][INFO] - [train] Step 15850 out of 20000 | Loss --> 1.781 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.196 | Lr --> 0.002 | Seconds_per_step --> 4.936 |
692
+ [2024-08-31 17:38:33,746][Main][INFO] - [train] Step 15875 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.177 | Lr --> 0.002 | Seconds_per_step --> 4.866 |
693
+ [2024-08-31 17:40:35,415][Main][INFO] - [train] Step 15900 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.165 | Lr --> 0.002 | Seconds_per_step --> 4.867 |
694
+ [2024-08-31 17:42:36,852][Main][INFO] - [train] Step 15925 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.154 | Lr --> 0.002 | Seconds_per_step --> 4.857 |
695
+ [2024-08-31 17:44:40,125][Main][INFO] - [train] Step 15950 out of 20000 | Loss --> 1.774 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.141 | Lr --> 0.002 | Seconds_per_step --> 4.931 |
696
+ [2024-08-31 17:46:42,051][Main][INFO] - [train] Step 15975 out of 20000 | Loss --> 1.761 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.121 | Lr --> 0.002 | Seconds_per_step --> 4.877 |
697
+ [2024-08-31 17:48:43,831][Main][INFO] - [train] Step 16000 out of 20000 | Loss --> 1.775 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.110 | Lr --> 0.002 | Seconds_per_step --> 4.871 |
698
+ [2024-08-31 17:50:47,287][Main][INFO] - [train] Step 16025 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.098 | Lr --> 0.002 | Seconds_per_step --> 4.938 |
699
+ [2024-08-31 17:52:48,992][Main][INFO] - [train] Step 16050 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.202 | Weights_l2 --> 11272.084 | Lr --> 0.002 | Seconds_per_step --> 4.868 |
700
+ [2024-08-31 17:54:50,724][Main][INFO] - [train] Step 16075 out of 20000 | Loss --> 1.767 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.068 | Lr --> 0.002 | Seconds_per_step --> 4.869 |
701
+ [2024-08-31 17:56:54,016][Main][INFO] - [train] Step 16100 out of 20000 | Loss --> 1.777 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.054 | Lr --> 0.002 | Seconds_per_step --> 4.932 |
702
+ [2024-08-31 17:58:55,942][Main][INFO] - [train] Step 16125 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.037 | Lr --> 0.002 | Seconds_per_step --> 4.877 |
703
+ [2024-08-31 18:00:57,478][Main][INFO] - [train] Step 16150 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.020 | Lr --> 0.002 | Seconds_per_step --> 4.861 |
704
+ [2024-08-31 18:03:00,736][Main][INFO] - [train] Step 16175 out of 20000 | Loss --> 1.767 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.000 | Lr --> 0.002 | Seconds_per_step --> 4.930 |
705
+ [2024-08-31 18:05:02,586][Main][INFO] - [train] Step 16200 out of 20000 | Loss --> 1.760 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.988 | Lr --> 0.002 | Seconds_per_step --> 4.874 |
706
+ [2024-08-31 18:07:04,438][Main][INFO] - [train] Step 16225 out of 20000 | Loss --> 1.758 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.975 | Lr --> 0.002 | Seconds_per_step --> 4.874 |
707
+ [2024-08-31 18:09:07,901][Main][INFO] - [train] Step 16250 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.962 | Lr --> 0.001 | Seconds_per_step --> 4.938 |
708
+ [2024-08-31 18:11:09,649][Main][INFO] - [train] Step 16275 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.952 | Lr --> 0.001 | Seconds_per_step --> 4.870 |
709
+ [2024-08-31 18:13:11,767][Main][INFO] - [train] Step 16300 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.935 | Lr --> 0.001 | Seconds_per_step --> 4.885 |
710
+ [2024-08-31 18:15:13,819][Main][INFO] - [train] Step 16325 out of 20000 | Loss --> 1.765 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.928 | Lr --> 0.001 | Seconds_per_step --> 4.882 |
711
+ [2024-08-31 18:17:17,096][Main][INFO] - [train] Step 16350 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.916 | Lr --> 0.001 | Seconds_per_step --> 4.931 |
712
+ [2024-08-31 18:19:18,879][Main][INFO] - [train] Step 16375 out of 20000 | Loss --> 1.757 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.902 | Lr --> 0.001 | Seconds_per_step --> 4.871 |
713
+ [2024-08-31 18:21:20,566][Main][INFO] - [train] Step 16400 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.886 | Lr --> 0.001 | Seconds_per_step --> 4.867 |
714
+ [2024-08-31 18:23:24,550][Main][INFO] - [train] Step 16425 out of 20000 | Loss --> 1.752 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.874 | Lr --> 0.001 | Seconds_per_step --> 4.959 |
715
+ [2024-08-31 18:25:26,461][Main][INFO] - [train] Step 16450 out of 20000 | Loss --> 1.756 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.860 | Lr --> 0.001 | Seconds_per_step --> 4.876 |
716
+ [2024-08-31 18:27:28,278][Main][INFO] - [train] Step 16475 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.846 | Lr --> 0.001 | Seconds_per_step --> 4.873 |
717
+ [2024-08-31 18:29:32,921][Main][INFO] - [train] Step 16500 out of 20000 | Loss --> 1.753 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.834 | Lr --> 0.001 | Seconds_per_step --> 4.986 |
718
+ [2024-08-31 18:31:35,149][Main][INFO] - [train] Step 16525 out of 20000 | Loss --> 1.760 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.819 | Lr --> 0.001 | Seconds_per_step --> 4.889 |
719
+ [2024-08-31 18:33:37,364][Main][INFO] - [train] Step 16550 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.799 | Lr --> 0.001 | Seconds_per_step --> 4.889 |
720
+ [2024-08-31 18:35:40,915][Main][INFO] - [train] Step 16575 out of 20000 | Loss --> 1.751 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.786 | Lr --> 0.001 | Seconds_per_step --> 4.942 |
721
+ [2024-08-31 18:37:43,466][Main][INFO] - [train] Step 16600 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.770 | Lr --> 0.001 | Seconds_per_step --> 4.902 |
722
+ [2024-08-31 18:39:45,637][Main][INFO] - [train] Step 16625 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.754 | Lr --> 0.001 | Seconds_per_step --> 4.887 |
723
+ [2024-08-31 18:41:49,405][Main][INFO] - [train] Step 16650 out of 20000 | Loss --> 1.748 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.739 | Lr --> 0.001 | Seconds_per_step --> 4.951 |
724
+ [2024-08-31 18:43:51,102][Main][INFO] - [train] Step 16675 out of 20000 | Loss --> 1.745 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.723 | Lr --> 0.001 | Seconds_per_step --> 4.868 |
725
+ [2024-08-31 18:45:52,944][Main][INFO] - [train] Step 16700 out of 20000 | Loss --> 1.736 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.708 | Lr --> 0.001 | Seconds_per_step --> 4.874 |
726
+ [2024-08-31 18:47:56,099][Main][INFO] - [train] Step 16725 out of 20000 | Loss --> 1.757 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.698 | Lr --> 0.001 | Seconds_per_step --> 4.926 |
727
+ [2024-08-31 18:49:57,801][Main][INFO] - [train] Step 16750 out of 20000 | Loss --> 1.742 | Grad_l2 --> 0.195 | Weights_l2 --> 11271.684 | Lr --> 0.001 | Seconds_per_step --> 4.868 |
728
+ [2024-08-31 18:51:59,437][Main][INFO] - [train] Step 16775 out of 20000 | Loss --> 1.755 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.665 | Lr --> 0.001 | Seconds_per_step --> 4.865 |
729
+ [2024-08-31 18:54:00,855][Main][INFO] - [train] Step 16800 out of 20000 | Loss --> 1.747 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.651 | Lr --> 0.001 | Seconds_per_step --> 4.857 |
730
+ [2024-08-31 18:56:03,853][Main][INFO] - [train] Step 16825 out of 20000 | Loss --> 1.743 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.632 | Lr --> 0.001 | Seconds_per_step --> 4.920 |
731
+ [2024-08-31 18:58:05,201][Main][INFO] - [train] Step 16850 out of 20000 | Loss --> 1.745 | Grad_l2 --> 0.201 | Weights_l2 --> 11271.619 | Lr --> 0.001 | Seconds_per_step --> 4.854 |
732
+ [2024-08-31 19:00:06,563][Main][INFO] - [train] Step 16875 out of 20000 | Loss --> 1.753 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.606 | Lr --> 0.001 | Seconds_per_step --> 4.854 |
733
+ [2024-08-31 19:02:09,317][Main][INFO] - [train] Step 16900 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.591 | Lr --> 0.001 | Seconds_per_step --> 4.910 |
734
+ [2024-08-31 19:04:10,472][Main][INFO] - [train] Step 16925 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.575 | Lr --> 0.001 | Seconds_per_step --> 4.846 |
735
+ [2024-08-31 19:06:11,583][Main][INFO] - [train] Step 16950 out of 20000 | Loss --> 1.741 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.559 | Lr --> 0.001 | Seconds_per_step --> 4.844 |
736
+ [2024-08-31 19:08:14,363][Main][INFO] - [train] Step 16975 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.545 | Lr --> 0.001 | Seconds_per_step --> 4.911 |
737
+ [2024-08-31 19:10:15,178][Main][INFO] - [train] Step 17000 out of 20000 | Loss --> 1.738 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.533 | Lr --> 0.001 | Seconds_per_step --> 4.832 |
738
+ [2024-08-31 19:12:16,110][Main][INFO] - [train] Step 17025 out of 20000 | Loss --> 1.741 | Grad_l2 --> 0.200 | Weights_l2 --> 11271.515 | Lr --> 0.001 | Seconds_per_step --> 4.837 |
739
+ [2024-08-31 19:14:18,572][Main][INFO] - [train] Step 17050 out of 20000 | Loss --> 1.740 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.500 | Lr --> 0.001 | Seconds_per_step --> 4.898 |
740
+ [2024-08-31 19:16:19,446][Main][INFO] - [train] Step 17075 out of 20000 | Loss --> 1.735 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.485 | Lr --> 0.001 | Seconds_per_step --> 4.835 |
741
+ [2024-08-31 19:18:20,449][Main][INFO] - [train] Step 17100 out of 20000 | Loss --> 1.734 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.471 | Lr --> 0.001 | Seconds_per_step --> 4.840 |
742
+ [2024-08-31 19:20:23,025][Main][INFO] - [train] Step 17125 out of 20000 | Loss --> 1.736 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.453 | Lr --> 0.001 | Seconds_per_step --> 4.903 |
743
+ [2024-08-31 19:22:23,946][Main][INFO] - [train] Step 17150 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.439 | Lr --> 0.001 | Seconds_per_step --> 4.837 |
744
+ [2024-08-31 19:24:25,361][Main][INFO] - [train] Step 17175 out of 20000 | Loss --> 1.732 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.422 | Lr --> 0.001 | Seconds_per_step --> 4.857 |
745
+ [2024-08-31 19:26:26,446][Main][INFO] - [train] Step 17200 out of 20000 | Loss --> 1.734 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.408 | Lr --> 0.001 | Seconds_per_step --> 4.843 |
746
+ [2024-08-31 19:28:29,313][Main][INFO] - [train] Step 17225 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.395 | Lr --> 0.001 | Seconds_per_step --> 4.915 |
747
+ [2024-08-31 19:30:30,859][Main][INFO] - [train] Step 17250 out of 20000 | Loss --> 1.737 | Grad_l2 --> 0.195 | Weights_l2 --> 11271.382 | Lr --> 0.001 | Seconds_per_step --> 4.862 |
748
+ [2024-08-31 19:32:32,558][Main][INFO] - [train] Step 17275 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.367 | Lr --> 0.001 | Seconds_per_step --> 4.868 |
749
+ [2024-08-31 19:34:35,555][Main][INFO] - [train] Step 17300 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.349 | Lr --> 0.001 | Seconds_per_step --> 4.920 |
750
+ [2024-08-31 19:36:37,051][Main][INFO] - [train] Step 17325 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.333 | Lr --> 0.001 | Seconds_per_step --> 4.860 |
751
+ [2024-08-31 19:38:38,163][Main][INFO] - [train] Step 17350 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.315 | Lr --> 0.001 | Seconds_per_step --> 4.844 |
752
+ [2024-08-31 19:40:41,121][Main][INFO] - [train] Step 17375 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.299 | Lr --> 0.001 | Seconds_per_step --> 4.918 |
753
+ [2024-08-31 19:42:42,654][Main][INFO] - [train] Step 17400 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.201 | Weights_l2 --> 11271.281 | Lr --> 0.001 | Seconds_per_step --> 4.861 |
754
+ [2024-08-31 19:44:44,126][Main][INFO] - [train] Step 17425 out of 20000 | Loss --> 1.713 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.264 | Lr --> 0.001 | Seconds_per_step --> 4.859 |
755
+ [2024-08-31 19:46:47,266][Main][INFO] - [train] Step 17450 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.200 | Weights_l2 --> 11271.245 | Lr --> 0.001 | Seconds_per_step --> 4.925 |
756
+ [2024-08-31 19:48:48,692][Main][INFO] - [train] Step 17475 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.227 | Lr --> 0.001 | Seconds_per_step --> 4.857 |
757
+ [2024-08-31 19:50:50,523][Main][INFO] - [train] Step 17500 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.212 | Lr --> 0.001 | Seconds_per_step --> 4.873 |
758
+ [2024-08-31 19:50:50,523][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-17500
759
+ [2024-08-31 19:50:50,530][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
760
+ [2024-08-31 19:50:57,172][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-17500/model.safetensors
761
+ [2024-08-31 19:51:06,254][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-17500/optimizer.bin
762
+ [2024-08-31 19:51:06,257][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-17500/scheduler.bin
763
+ [2024-08-31 19:51:06,258][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-17500/sampler.bin
764
+ [2024-08-31 19:51:06,260][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-17500/sampler_1.bin
765
+ [2024-08-31 19:51:06,261][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-17500/random_states_0.pkl
766
+ [2024-08-31 19:53:08,757][Main][INFO] - [train] Step 17525 out of 20000 | Loss --> 1.716 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.193 | Lr --> 0.001 | Seconds_per_step --> 5.529 |
767
+ [2024-08-31 19:55:09,923][Main][INFO] - [train] Step 17550 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.173 | Lr --> 0.001 | Seconds_per_step --> 4.847 |
768
+ [2024-08-31 19:57:11,243][Main][INFO] - [train] Step 17575 out of 20000 | Loss --> 1.700 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.155 | Lr --> 0.001 | Seconds_per_step --> 4.853 |
769
+ [2024-08-31 19:59:14,099][Main][INFO] - [train] Step 17600 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.137 | Lr --> 0.001 | Seconds_per_step --> 4.914 |
770
+ [2024-08-31 20:01:15,562][Main][INFO] - [train] Step 17625 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.118 | Lr --> 0.001 | Seconds_per_step --> 4.858 |
771
+ [2024-08-31 20:03:16,470][Main][INFO] - [train] Step 17650 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.097 | Lr --> 0.001 | Seconds_per_step --> 4.836 |
772
+ [2024-08-31 20:05:17,916][Main][INFO] - [train] Step 17675 out of 20000 | Loss --> 1.733 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.078 | Lr --> 0.001 | Seconds_per_step --> 4.858 |
773
+ [2024-08-31 20:07:20,683][Main][INFO] - [train] Step 17700 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.062 | Lr --> 0.001 | Seconds_per_step --> 4.911 |
774
+ [2024-08-31 20:09:22,414][Main][INFO] - [train] Step 17725 out of 20000 | Loss --> 1.707 | Grad_l2 --> 0.195 | Weights_l2 --> 11271.041 | Lr --> 0.001 | Seconds_per_step --> 4.869 |
775
+ [2024-08-31 20:11:24,033][Main][INFO] - [train] Step 17750 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.024 | Lr --> 0.001 | Seconds_per_step --> 4.865 |
776
+ [2024-08-31 20:13:26,602][Main][INFO] - [train] Step 17775 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.007 | Lr --> 0.001 | Seconds_per_step --> 4.903 |
777
+ [2024-08-31 20:15:27,607][Main][INFO] - [train] Step 17800 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.991 | Lr --> 0.001 | Seconds_per_step --> 4.840 |
778
+ [2024-08-31 20:17:28,616][Main][INFO] - [train] Step 17825 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.976 | Lr --> 0.001 | Seconds_per_step --> 4.840 |
779
+ [2024-08-31 20:19:31,033][Main][INFO] - [train] Step 17850 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.961 | Lr --> 0.001 | Seconds_per_step --> 4.897 |
780
+ [2024-08-31 20:21:32,133][Main][INFO] - [train] Step 17875 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.946 | Lr --> 0.001 | Seconds_per_step --> 4.844 |
781
+ [2024-08-31 20:23:33,151][Main][INFO] - [train] Step 17900 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.933 | Lr --> 0.000 | Seconds_per_step --> 4.841 |
782
+ [2024-08-31 20:25:35,828][Main][INFO] - [train] Step 17925 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.921 | Lr --> 0.000 | Seconds_per_step --> 4.907 |
783
+ [2024-08-31 20:27:36,892][Main][INFO] - [train] Step 17950 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.911 | Lr --> 0.000 | Seconds_per_step --> 4.842 |
784
+ [2024-08-31 20:29:38,066][Main][INFO] - [train] Step 17975 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.900 | Lr --> 0.000 | Seconds_per_step --> 4.847 |
785
+ [2024-08-31 20:31:40,569][Main][INFO] - [train] Step 18000 out of 20000 | Loss --> 1.738 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.891 | Lr --> 0.000 | Seconds_per_step --> 4.900 |
786
+ [2024-08-31 20:33:41,408][Main][INFO] - [train] Step 18025 out of 20000 | Loss --> 1.739 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.884 | Lr --> 0.000 | Seconds_per_step --> 4.833 |
787
+ [2024-08-31 20:35:42,352][Main][INFO] - [train] Step 18050 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.876 | Lr --> 0.000 | Seconds_per_step --> 4.838 |
788
+ [2024-08-31 20:37:45,322][Main][INFO] - [train] Step 18075 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.869 | Lr --> 0.000 | Seconds_per_step --> 4.919 |
789
+ [2024-08-31 20:39:46,981][Main][INFO] - [train] Step 18100 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.862 | Lr --> 0.000 | Seconds_per_step --> 4.866 |
790
+ [2024-08-31 20:41:48,584][Main][INFO] - [train] Step 18125 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.855 | Lr --> 0.000 | Seconds_per_step --> 4.864 |
791
+ [2024-08-31 20:43:49,907][Main][INFO] - [train] Step 18150 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.853 |
792
+ [2024-08-31 20:45:52,968][Main][INFO] - [train] Step 18175 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.844 | Lr --> 0.000 | Seconds_per_step --> 4.922 |
793
+ [2024-08-31 20:47:54,325][Main][INFO] - [train] Step 18200 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.854 |
794
+ [2024-08-31 20:49:55,663][Main][INFO] - [train] Step 18225 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.195 | Weights_l2 --> 11270.840 | Lr --> 0.000 | Seconds_per_step --> 4.853 |
795
+ [2024-08-31 20:51:58,657][Main][INFO] - [train] Step 18250 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.836 | Lr --> 0.000 | Seconds_per_step --> 4.920 |
796
+ [2024-08-31 20:54:00,083][Main][INFO] - [train] Step 18275 out of 20000 | Loss --> 1.721 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.834 | Lr --> 0.000 | Seconds_per_step --> 4.857 |
797
+ [2024-08-31 20:56:01,850][Main][INFO] - [train] Step 18300 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.831 | Lr --> 0.000 | Seconds_per_step --> 4.871 |
798
+ [2024-08-31 20:58:04,690][Main][INFO] - [train] Step 18325 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.832 | Lr --> 0.000 | Seconds_per_step --> 4.913 |
799
+ [2024-08-31 21:00:06,226][Main][INFO] - [train] Step 18350 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.832 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
800
+ [2024-08-31 21:02:07,970][Main][INFO] - [train] Step 18375 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.831 | Lr --> 0.000 | Seconds_per_step --> 4.870 |
801
+ [2024-08-31 21:04:11,035][Main][INFO] - [train] Step 18400 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.831 | Lr --> 0.000 | Seconds_per_step --> 4.922 |
802
+ [2024-08-31 21:06:12,731][Main][INFO] - [train] Step 18425 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.834 | Lr --> 0.000 | Seconds_per_step --> 4.868 |
803
+ [2024-08-31 21:08:14,292][Main][INFO] - [train] Step 18450 out of 20000 | Loss --> 1.721 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.836 | Lr --> 0.000 | Seconds_per_step --> 4.862 |
804
+ [2024-08-31 21:10:17,481][Main][INFO] - [train] Step 18475 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.837 | Lr --> 0.000 | Seconds_per_step --> 4.927 |
805
+ [2024-08-31 21:12:19,115][Main][INFO] - [train] Step 18500 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.838 | Lr --> 0.000 | Seconds_per_step --> 4.865 |
806
+ [2024-08-31 21:14:20,604][Main][INFO] - [train] Step 18525 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.839 | Lr --> 0.000 | Seconds_per_step --> 4.859 |
807
+ [2024-08-31 21:16:24,832][Main][INFO] - [train] Step 18550 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.969 |
808
+ [2024-08-31 21:18:26,217][Main][INFO] - [train] Step 18575 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.855 |
809
+ [2024-08-31 21:20:27,684][Main][INFO] - [train] Step 18600 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.840 | Lr --> 0.000 | Seconds_per_step --> 4.859 |
810
+ [2024-08-31 21:22:29,530][Main][INFO] - [train] Step 18625 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.874 |
811
+ [2024-08-31 21:24:33,115][Main][INFO] - [train] Step 18650 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.195 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.943 |
812
+ [2024-08-31 21:26:35,121][Main][INFO] - [train] Step 18675 out of 20000 | Loss --> 1.732 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.842 | Lr --> 0.000 | Seconds_per_step --> 4.880 |
813
+ [2024-08-31 21:28:37,128][Main][INFO] - [train] Step 18700 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.843 | Lr --> 0.000 | Seconds_per_step --> 4.880 |
814
+ [2024-08-31 21:30:40,396][Main][INFO] - [train] Step 18725 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.843 | Lr --> 0.000 | Seconds_per_step --> 4.931 |
815
+ [2024-08-31 21:33:03,172][Main][INFO] - [train] Step 18750 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.844 | Lr --> 0.000 | Seconds_per_step --> 5.711 |
816
+ [2024-08-31 21:35:04,721][Main][INFO] - [train] Step 18775 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.845 | Lr --> 0.000 | Seconds_per_step --> 4.862 |
817
+ [2024-08-31 21:37:08,128][Main][INFO] - [train] Step 18800 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.845 | Lr --> 0.000 | Seconds_per_step --> 4.936 |
818
+ [2024-08-31 21:39:09,857][Main][INFO] - [train] Step 18825 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.201 | Weights_l2 --> 11270.845 | Lr --> 0.000 | Seconds_per_step --> 4.869 |
819
+ [2024-08-31 21:41:11,700][Main][INFO] - [train] Step 18850 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.847 | Lr --> 0.000 | Seconds_per_step --> 4.874 |
820
+ [2024-08-31 21:43:15,117][Main][INFO] - [train] Step 18875 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.846 | Lr --> 0.000 | Seconds_per_step --> 4.937 |
821
+ [2024-08-31 21:45:19,433][Main][INFO] - [train] Step 18900 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.847 | Lr --> 0.000 | Seconds_per_step --> 4.973 |
822
+ [2024-08-31 21:47:29,032][Main][INFO] - [train] Step 18925 out of 20000 | Loss --> 1.709 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.847 | Lr --> 0.000 | Seconds_per_step --> 5.184 |
823
+ [2024-08-31 21:49:33,512][Main][INFO] - [train] Step 18950 out of 20000 | Loss --> 1.731 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.848 | Lr --> 0.000 | Seconds_per_step --> 4.979 |
824
+ [2024-08-31 21:51:35,196][Main][INFO] - [train] Step 18975 out of 20000 | Loss --> 1.721 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.867 |
825
+ [2024-08-31 21:53:36,788][Main][INFO] - [train] Step 19000 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.864 |
826
+ [2024-08-31 21:55:38,313][Main][INFO] - [train] Step 19025 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
827
+ [2024-08-31 21:57:41,329][Main][INFO] - [train] Step 19050 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.921 |
828
+ [2024-08-31 21:59:42,853][Main][INFO] - [train] Step 19075 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
829
+ [2024-08-31 22:01:44,492][Main][INFO] - [train] Step 19100 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.865 |
830
+ [2024-08-31 22:03:47,660][Main][INFO] - [train] Step 19125 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.927 |
831
+ [2024-08-31 22:05:49,133][Main][INFO] - [train] Step 19150 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.859 |
832
+ [2024-08-31 22:07:50,623][Main][INFO] - [train] Step 19175 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.860 |
833
+ [2024-08-31 22:09:53,873][Main][INFO] - [train] Step 19200 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.930 |
834
+ [2024-08-31 22:11:55,529][Main][INFO] - [train] Step 19225 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.866 |
835
+ [2024-08-31 22:13:57,272][Main][INFO] - [train] Step 19250 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.870 |
836
+ [2024-08-31 22:16:01,229][Main][INFO] - [train] Step 19275 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.958 |
837
+ [2024-08-31 22:18:03,766][Main][INFO] - [train] Step 19300 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.901 |
838
+ [2024-08-31 22:20:06,053][Main][INFO] - [train] Step 19325 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.891 |
839
+ [2024-08-31 22:22:09,832][Main][INFO] - [train] Step 19350 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.200 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.951 |
840
+ [2024-08-31 22:24:11,539][Main][INFO] - [train] Step 19375 out of 20000 | Loss --> 1.707 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.868 |
841
+ [2024-08-31 22:26:13,254][Main][INFO] - [train] Step 19400 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.869 |
842
+ [2024-08-31 22:28:15,089][Main][INFO] - [train] Step 19425 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.873 |
843
+ [2024-08-31 22:30:18,617][Main][INFO] - [train] Step 19450 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.941 |
844
+ [2024-08-31 22:32:20,460][Main][INFO] - [train] Step 19475 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.874 |
845
+ [2024-08-31 22:34:21,991][Main][INFO] - [train] Step 19500 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
846
+ [2024-08-31 22:36:24,959][Main][INFO] - [train] Step 19525 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.919 |
847
+ [2024-08-31 22:38:26,479][Main][INFO] - [train] Step 19550 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
848
+ [2024-08-31 22:40:28,432][Main][INFO] - [train] Step 19575 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.878 |
849
+ [2024-08-31 22:42:31,962][Main][INFO] - [train] Step 19600 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.941 |
850
+ [2024-08-31 22:44:33,970][Main][INFO] - [train] Step 19625 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.880 |
851
+ [2024-08-31 22:46:38,990][Main][INFO] - [train] Step 19650 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 5.001 |
852
+ [2024-08-31 22:48:42,541][Main][INFO] - [train] Step 19675 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.942 |
853
+ [2024-08-31 22:50:48,499][Main][INFO] - [train] Step 19700 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 5.038 |
854
+ [2024-08-31 22:52:50,152][Main][INFO] - [train] Step 19725 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.866 |
855
+ [2024-08-31 22:54:53,252][Main][INFO] - [train] Step 19750 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.924 |
856
+ [2024-08-31 22:56:55,118][Main][INFO] - [train] Step 19775 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.875 |
857
+ [2024-08-31 22:58:56,863][Main][INFO] - [train] Step 19800 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.870 |
858
+ [2024-08-31 23:01:01,950][Main][INFO] - [train] Step 19825 out of 20000 | Loss --> 1.710 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 5.003 |
859
+ [2024-08-31 23:03:04,913][Main][INFO] - [train] Step 19850 out of 20000 | Loss --> 1.713 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.918 |
860
+ [2024-08-31 23:05:06,946][Main][INFO] - [train] Step 19875 out of 20000 | Loss --> 1.710 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.881 |
861
+ [2024-08-31 23:07:08,902][Main][INFO] - [train] Step 19900 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.878 |
862
+ [2024-08-31 23:09:12,065][Main][INFO] - [train] Step 19925 out of 20000 | Loss --> 1.716 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.926 |
863
+ [2024-08-31 23:11:13,586][Main][INFO] - [train] Step 19950 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
864
+ [2024-08-31 23:13:15,435][Main][INFO] - [train] Step 19975 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.874 |
865
+ [2024-08-31 23:15:18,590][Main][INFO] - [train] Step 20000 out of 20000 | Loss --> 1.716 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.926 |
866
+ [2024-08-31 23:15:18,591][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20000
867
+ [2024-08-31 23:15:18,599][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
868
+ [2024-08-31 23:15:26,324][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20000/model.safetensors
869
+ [2024-08-31 23:15:35,439][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20000/optimizer.bin
870
+ [2024-08-31 23:15:35,440][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20000/scheduler.bin
871
+ [2024-08-31 23:15:35,441][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20000/sampler.bin
872
+ [2024-08-31 23:15:35,441][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20000/sampler_1.bin
873
+ [2024-08-31 23:15:35,442][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20000/random_states_0.pkl
874
+ [2024-08-31 23:15:39,524][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 16 (max is dataset.n_shards=8). Stopping 8 dataloader workers.
875
+ [2024-08-31 23:31:42,282][Main][INFO] - [eval] Step 20001 out of 20000 | Loss --> 2.073 | Accuracy --> 0.604 | Time --> 964.275 |
876
+ [2024-08-31 23:31:42,287][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20001
877
+ [2024-08-31 23:31:42,295][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
878
+ [2024-08-31 23:31:50,975][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20001/model.safetensors
879
+ [2024-08-31 23:32:00,717][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20001/optimizer.bin
880
+ [2024-08-31 23:32:00,719][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20001/scheduler.bin
881
+ [2024-08-31 23:32:00,720][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20001/sampler.bin
882
+ [2024-08-31 23:32:00,720][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20001/sampler_1.bin
883
+ [2024-08-31 23:32:00,721][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20001/random_states_0.pkl
checkpoints/tokenizer/added_tokens.json ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<extra_id_0>": 48128,
3
+ "<extra_id_10>": 48138,
4
+ "<extra_id_11>": 48139,
5
+ "<extra_id_12>": 48140,
6
+ "<extra_id_13>": 48141,
7
+ "<extra_id_14>": 48142,
8
+ "<extra_id_15>": 48143,
9
+ "<extra_id_16>": 48144,
10
+ "<extra_id_17>": 48145,
11
+ "<extra_id_18>": 48146,
12
+ "<extra_id_19>": 48147,
13
+ "<extra_id_1>": 48129,
14
+ "<extra_id_20>": 48148,
15
+ "<extra_id_21>": 48149,
16
+ "<extra_id_22>": 48150,
17
+ "<extra_id_23>": 48151,
18
+ "<extra_id_24>": 48152,
19
+ "<extra_id_25>": 48153,
20
+ "<extra_id_26>": 48154,
21
+ "<extra_id_27>": 48155,
22
+ "<extra_id_28>": 48156,
23
+ "<extra_id_29>": 48157,
24
+ "<extra_id_2>": 48130,
25
+ "<extra_id_30>": 48158,
26
+ "<extra_id_31>": 48159,
27
+ "<extra_id_32>": 48160,
28
+ "<extra_id_33>": 48161,
29
+ "<extra_id_34>": 48162,
30
+ "<extra_id_35>": 48163,
31
+ "<extra_id_36>": 48164,
32
+ "<extra_id_37>": 48165,
33
+ "<extra_id_38>": 48166,
34
+ "<extra_id_39>": 48167,
35
+ "<extra_id_3>": 48131,
36
+ "<extra_id_40>": 48168,
37
+ "<extra_id_41>": 48169,
38
+ "<extra_id_42>": 48170,
39
+ "<extra_id_43>": 48171,
40
+ "<extra_id_44>": 48172,
41
+ "<extra_id_45>": 48173,
42
+ "<extra_id_46>": 48174,
43
+ "<extra_id_47>": 48175,
44
+ "<extra_id_48>": 48176,
45
+ "<extra_id_49>": 48177,
46
+ "<extra_id_4>": 48132,
47
+ "<extra_id_50>": 48178,
48
+ "<extra_id_51>": 48179,
49
+ "<extra_id_52>": 48180,
50
+ "<extra_id_53>": 48181,
51
+ "<extra_id_54>": 48182,
52
+ "<extra_id_55>": 48183,
53
+ "<extra_id_56>": 48184,
54
+ "<extra_id_57>": 48185,
55
+ "<extra_id_58>": 48186,
56
+ "<extra_id_59>": 48187,
57
+ "<extra_id_5>": 48133,
58
+ "<extra_id_60>": 48188,
59
+ "<extra_id_61>": 48189,
60
+ "<extra_id_62>": 48190,
61
+ "<extra_id_63>": 48191,
62
+ "<extra_id_64>": 48192,
63
+ "<extra_id_65>": 48193,
64
+ "<extra_id_66>": 48194,
65
+ "<extra_id_67>": 48195,
66
+ "<extra_id_68>": 48196,
67
+ "<extra_id_69>": 48197,
68
+ "<extra_id_6>": 48134,
69
+ "<extra_id_70>": 48198,
70
+ "<extra_id_71>": 48199,
71
+ "<extra_id_72>": 48200,
72
+ "<extra_id_73>": 48201,
73
+ "<extra_id_74>": 48202,
74
+ "<extra_id_75>": 48203,
75
+ "<extra_id_76>": 48204,
76
+ "<extra_id_77>": 48205,
77
+ "<extra_id_78>": 48206,
78
+ "<extra_id_79>": 48207,
79
+ "<extra_id_7>": 48135,
80
+ "<extra_id_80>": 48208,
81
+ "<extra_id_81>": 48209,
82
+ "<extra_id_82>": 48210,
83
+ "<extra_id_83>": 48211,
84
+ "<extra_id_84>": 48212,
85
+ "<extra_id_85>": 48213,
86
+ "<extra_id_86>": 48214,
87
+ "<extra_id_87>": 48215,
88
+ "<extra_id_88>": 48216,
89
+ "<extra_id_89>": 48217,
90
+ "<extra_id_8>": 48136,
91
+ "<extra_id_90>": 48218,
92
+ "<extra_id_91>": 48219,
93
+ "<extra_id_92>": 48220,
94
+ "<extra_id_93>": 48221,
95
+ "<extra_id_94>": 48222,
96
+ "<extra_id_95>": 48223,
97
+ "<extra_id_96>": 48224,
98
+ "<extra_id_97>": 48225,
99
+ "<extra_id_98>": 48226,
100
+ "<extra_id_99>": 48227,
101
+ "<extra_id_9>": 48137
102
+ }
checkpoints/tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>"
103
+ ],
104
+ "bos_token": {
105
+ "content": "<s>",
106
+ "lstrip": false,
107
+ "normalized": true,
108
+ "rstrip": false,
109
+ "single_word": false
110
+ },
111
+ "eos_token": {
112
+ "content": "</s>",
113
+ "lstrip": false,
114
+ "normalized": true,
115
+ "rstrip": false,
116
+ "single_word": false
117
+ },
118
+ "pad_token": {
119
+ "content": "<pad>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false
124
+ },
125
+ "unk_token": {
126
+ "content": "<unk>",
127
+ "lstrip": false,
128
+ "normalized": true,
129
+ "rstrip": false,
130
+ "single_word": false
131
+ }
132
+ }
checkpoints/tokenizer/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoints/tokenizer/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:165515d67d17852d3e7c30ec22d7a745330f2cba264f40c3ea9380eddf84396f
3
+ size 1042483
checkpoints/tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,952 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": true,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": true,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": true,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "3": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": true
37
+ },
38
+ "48128": {
39
+ "content": "<extra_id_0>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": true
45
+ },
46
+ "48129": {
47
+ "content": "<extra_id_1>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": true
53
+ },
54
+ "48130": {
55
+ "content": "<extra_id_2>",
56
+ "lstrip": false,
57
+ "normalized": false,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": true
61
+ },
62
+ "48131": {
63
+ "content": "<extra_id_3>",
64
+ "lstrip": false,
65
+ "normalized": false,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": true
69
+ },
70
+ "48132": {
71
+ "content": "<extra_id_4>",
72
+ "lstrip": false,
73
+ "normalized": false,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": true
77
+ },
78
+ "48133": {
79
+ "content": "<extra_id_5>",
80
+ "lstrip": false,
81
+ "normalized": false,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": true
85
+ },
86
+ "48134": {
87
+ "content": "<extra_id_6>",
88
+ "lstrip": false,
89
+ "normalized": false,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": true
93
+ },
94
+ "48135": {
95
+ "content": "<extra_id_7>",
96
+ "lstrip": false,
97
+ "normalized": false,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": true
101
+ },
102
+ "48136": {
103
+ "content": "<extra_id_8>",
104
+ "lstrip": false,
105
+ "normalized": false,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": true
109
+ },
110
+ "48137": {
111
+ "content": "<extra_id_9>",
112
+ "lstrip": false,
113
+ "normalized": false,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": true
117
+ },
118
+ "48138": {
119
+ "content": "<extra_id_10>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": true
125
+ },
126
+ "48139": {
127
+ "content": "<extra_id_11>",
128
+ "lstrip": false,
129
+ "normalized": false,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": true
133
+ },
134
+ "48140": {
135
+ "content": "<extra_id_12>",
136
+ "lstrip": false,
137
+ "normalized": false,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": true
141
+ },
142
+ "48141": {
143
+ "content": "<extra_id_13>",
144
+ "lstrip": false,
145
+ "normalized": false,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": true
149
+ },
150
+ "48142": {
151
+ "content": "<extra_id_14>",
152
+ "lstrip": false,
153
+ "normalized": false,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": true
157
+ },
158
+ "48143": {
159
+ "content": "<extra_id_15>",
160
+ "lstrip": false,
161
+ "normalized": false,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": true
165
+ },
166
+ "48144": {
167
+ "content": "<extra_id_16>",
168
+ "lstrip": false,
169
+ "normalized": false,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": true
173
+ },
174
+ "48145": {
175
+ "content": "<extra_id_17>",
176
+ "lstrip": false,
177
+ "normalized": false,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": true
181
+ },
182
+ "48146": {
183
+ "content": "<extra_id_18>",
184
+ "lstrip": false,
185
+ "normalized": false,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": true
189
+ },
190
+ "48147": {
191
+ "content": "<extra_id_19>",
192
+ "lstrip": false,
193
+ "normalized": false,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": true
197
+ },
198
+ "48148": {
199
+ "content": "<extra_id_20>",
200
+ "lstrip": false,
201
+ "normalized": false,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": true
205
+ },
206
+ "48149": {
207
+ "content": "<extra_id_21>",
208
+ "lstrip": false,
209
+ "normalized": false,
210
+ "rstrip": false,
211
+ "single_word": false,
212
+ "special": true
213
+ },
214
+ "48150": {
215
+ "content": "<extra_id_22>",
216
+ "lstrip": false,
217
+ "normalized": false,
218
+ "rstrip": false,
219
+ "single_word": false,
220
+ "special": true
221
+ },
222
+ "48151": {
223
+ "content": "<extra_id_23>",
224
+ "lstrip": false,
225
+ "normalized": false,
226
+ "rstrip": false,
227
+ "single_word": false,
228
+ "special": true
229
+ },
230
+ "48152": {
231
+ "content": "<extra_id_24>",
232
+ "lstrip": false,
233
+ "normalized": false,
234
+ "rstrip": false,
235
+ "single_word": false,
236
+ "special": true
237
+ },
238
+ "48153": {
239
+ "content": "<extra_id_25>",
240
+ "lstrip": false,
241
+ "normalized": false,
242
+ "rstrip": false,
243
+ "single_word": false,
244
+ "special": true
245
+ },
246
+ "48154": {
247
+ "content": "<extra_id_26>",
248
+ "lstrip": false,
249
+ "normalized": false,
250
+ "rstrip": false,
251
+ "single_word": false,
252
+ "special": true
253
+ },
254
+ "48155": {
255
+ "content": "<extra_id_27>",
256
+ "lstrip": false,
257
+ "normalized": false,
258
+ "rstrip": false,
259
+ "single_word": false,
260
+ "special": true
261
+ },
262
+ "48156": {
263
+ "content": "<extra_id_28>",
264
+ "lstrip": false,
265
+ "normalized": false,
266
+ "rstrip": false,
267
+ "single_word": false,
268
+ "special": true
269
+ },
270
+ "48157": {
271
+ "content": "<extra_id_29>",
272
+ "lstrip": false,
273
+ "normalized": false,
274
+ "rstrip": false,
275
+ "single_word": false,
276
+ "special": true
277
+ },
278
+ "48158": {
279
+ "content": "<extra_id_30>",
280
+ "lstrip": false,
281
+ "normalized": false,
282
+ "rstrip": false,
283
+ "single_word": false,
284
+ "special": true
285
+ },
286
+ "48159": {
287
+ "content": "<extra_id_31>",
288
+ "lstrip": false,
289
+ "normalized": false,
290
+ "rstrip": false,
291
+ "single_word": false,
292
+ "special": true
293
+ },
294
+ "48160": {
295
+ "content": "<extra_id_32>",
296
+ "lstrip": false,
297
+ "normalized": false,
298
+ "rstrip": false,
299
+ "single_word": false,
300
+ "special": true
301
+ },
302
+ "48161": {
303
+ "content": "<extra_id_33>",
304
+ "lstrip": false,
305
+ "normalized": false,
306
+ "rstrip": false,
307
+ "single_word": false,
308
+ "special": true
309
+ },
310
+ "48162": {
311
+ "content": "<extra_id_34>",
312
+ "lstrip": false,
313
+ "normalized": false,
314
+ "rstrip": false,
315
+ "single_word": false,
316
+ "special": true
317
+ },
318
+ "48163": {
319
+ "content": "<extra_id_35>",
320
+ "lstrip": false,
321
+ "normalized": false,
322
+ "rstrip": false,
323
+ "single_word": false,
324
+ "special": true
325
+ },
326
+ "48164": {
327
+ "content": "<extra_id_36>",
328
+ "lstrip": false,
329
+ "normalized": false,
330
+ "rstrip": false,
331
+ "single_word": false,
332
+ "special": true
333
+ },
334
+ "48165": {
335
+ "content": "<extra_id_37>",
336
+ "lstrip": false,
337
+ "normalized": false,
338
+ "rstrip": false,
339
+ "single_word": false,
340
+ "special": true
341
+ },
342
+ "48166": {
343
+ "content": "<extra_id_38>",
344
+ "lstrip": false,
345
+ "normalized": false,
346
+ "rstrip": false,
347
+ "single_word": false,
348
+ "special": true
349
+ },
350
+ "48167": {
351
+ "content": "<extra_id_39>",
352
+ "lstrip": false,
353
+ "normalized": false,
354
+ "rstrip": false,
355
+ "single_word": false,
356
+ "special": true
357
+ },
358
+ "48168": {
359
+ "content": "<extra_id_40>",
360
+ "lstrip": false,
361
+ "normalized": false,
362
+ "rstrip": false,
363
+ "single_word": false,
364
+ "special": true
365
+ },
366
+ "48169": {
367
+ "content": "<extra_id_41>",
368
+ "lstrip": false,
369
+ "normalized": false,
370
+ "rstrip": false,
371
+ "single_word": false,
372
+ "special": true
373
+ },
374
+ "48170": {
375
+ "content": "<extra_id_42>",
376
+ "lstrip": false,
377
+ "normalized": false,
378
+ "rstrip": false,
379
+ "single_word": false,
380
+ "special": true
381
+ },
382
+ "48171": {
383
+ "content": "<extra_id_43>",
384
+ "lstrip": false,
385
+ "normalized": false,
386
+ "rstrip": false,
387
+ "single_word": false,
388
+ "special": true
389
+ },
390
+ "48172": {
391
+ "content": "<extra_id_44>",
392
+ "lstrip": false,
393
+ "normalized": false,
394
+ "rstrip": false,
395
+ "single_word": false,
396
+ "special": true
397
+ },
398
+ "48173": {
399
+ "content": "<extra_id_45>",
400
+ "lstrip": false,
401
+ "normalized": false,
402
+ "rstrip": false,
403
+ "single_word": false,
404
+ "special": true
405
+ },
406
+ "48174": {
407
+ "content": "<extra_id_46>",
408
+ "lstrip": false,
409
+ "normalized": false,
410
+ "rstrip": false,
411
+ "single_word": false,
412
+ "special": true
413
+ },
414
+ "48175": {
415
+ "content": "<extra_id_47>",
416
+ "lstrip": false,
417
+ "normalized": false,
418
+ "rstrip": false,
419
+ "single_word": false,
420
+ "special": true
421
+ },
422
+ "48176": {
423
+ "content": "<extra_id_48>",
424
+ "lstrip": false,
425
+ "normalized": false,
426
+ "rstrip": false,
427
+ "single_word": false,
428
+ "special": true
429
+ },
430
+ "48177": {
431
+ "content": "<extra_id_49>",
432
+ "lstrip": false,
433
+ "normalized": false,
434
+ "rstrip": false,
435
+ "single_word": false,
436
+ "special": true
437
+ },
438
+ "48178": {
439
+ "content": "<extra_id_50>",
440
+ "lstrip": false,
441
+ "normalized": false,
442
+ "rstrip": false,
443
+ "single_word": false,
444
+ "special": true
445
+ },
446
+ "48179": {
447
+ "content": "<extra_id_51>",
448
+ "lstrip": false,
449
+ "normalized": false,
450
+ "rstrip": false,
451
+ "single_word": false,
452
+ "special": true
453
+ },
454
+ "48180": {
455
+ "content": "<extra_id_52>",
456
+ "lstrip": false,
457
+ "normalized": false,
458
+ "rstrip": false,
459
+ "single_word": false,
460
+ "special": true
461
+ },
462
+ "48181": {
463
+ "content": "<extra_id_53>",
464
+ "lstrip": false,
465
+ "normalized": false,
466
+ "rstrip": false,
467
+ "single_word": false,
468
+ "special": true
469
+ },
470
+ "48182": {
471
+ "content": "<extra_id_54>",
472
+ "lstrip": false,
473
+ "normalized": false,
474
+ "rstrip": false,
475
+ "single_word": false,
476
+ "special": true
477
+ },
478
+ "48183": {
479
+ "content": "<extra_id_55>",
480
+ "lstrip": false,
481
+ "normalized": false,
482
+ "rstrip": false,
483
+ "single_word": false,
484
+ "special": true
485
+ },
486
+ "48184": {
487
+ "content": "<extra_id_56>",
488
+ "lstrip": false,
489
+ "normalized": false,
490
+ "rstrip": false,
491
+ "single_word": false,
492
+ "special": true
493
+ },
494
+ "48185": {
495
+ "content": "<extra_id_57>",
496
+ "lstrip": false,
497
+ "normalized": false,
498
+ "rstrip": false,
499
+ "single_word": false,
500
+ "special": true
501
+ },
502
+ "48186": {
503
+ "content": "<extra_id_58>",
504
+ "lstrip": false,
505
+ "normalized": false,
506
+ "rstrip": false,
507
+ "single_word": false,
508
+ "special": true
509
+ },
510
+ "48187": {
511
+ "content": "<extra_id_59>",
512
+ "lstrip": false,
513
+ "normalized": false,
514
+ "rstrip": false,
515
+ "single_word": false,
516
+ "special": true
517
+ },
518
+ "48188": {
519
+ "content": "<extra_id_60>",
520
+ "lstrip": false,
521
+ "normalized": false,
522
+ "rstrip": false,
523
+ "single_word": false,
524
+ "special": true
525
+ },
526
+ "48189": {
527
+ "content": "<extra_id_61>",
528
+ "lstrip": false,
529
+ "normalized": false,
530
+ "rstrip": false,
531
+ "single_word": false,
532
+ "special": true
533
+ },
534
+ "48190": {
535
+ "content": "<extra_id_62>",
536
+ "lstrip": false,
537
+ "normalized": false,
538
+ "rstrip": false,
539
+ "single_word": false,
540
+ "special": true
541
+ },
542
+ "48191": {
543
+ "content": "<extra_id_63>",
544
+ "lstrip": false,
545
+ "normalized": false,
546
+ "rstrip": false,
547
+ "single_word": false,
548
+ "special": true
549
+ },
550
+ "48192": {
551
+ "content": "<extra_id_64>",
552
+ "lstrip": false,
553
+ "normalized": false,
554
+ "rstrip": false,
555
+ "single_word": false,
556
+ "special": true
557
+ },
558
+ "48193": {
559
+ "content": "<extra_id_65>",
560
+ "lstrip": false,
561
+ "normalized": false,
562
+ "rstrip": false,
563
+ "single_word": false,
564
+ "special": true
565
+ },
566
+ "48194": {
567
+ "content": "<extra_id_66>",
568
+ "lstrip": false,
569
+ "normalized": false,
570
+ "rstrip": false,
571
+ "single_word": false,
572
+ "special": true
573
+ },
574
+ "48195": {
575
+ "content": "<extra_id_67>",
576
+ "lstrip": false,
577
+ "normalized": false,
578
+ "rstrip": false,
579
+ "single_word": false,
580
+ "special": true
581
+ },
582
+ "48196": {
583
+ "content": "<extra_id_68>",
584
+ "lstrip": false,
585
+ "normalized": false,
586
+ "rstrip": false,
587
+ "single_word": false,
588
+ "special": true
589
+ },
590
+ "48197": {
591
+ "content": "<extra_id_69>",
592
+ "lstrip": false,
593
+ "normalized": false,
594
+ "rstrip": false,
595
+ "single_word": false,
596
+ "special": true
597
+ },
598
+ "48198": {
599
+ "content": "<extra_id_70>",
600
+ "lstrip": false,
601
+ "normalized": false,
602
+ "rstrip": false,
603
+ "single_word": false,
604
+ "special": true
605
+ },
606
+ "48199": {
607
+ "content": "<extra_id_71>",
608
+ "lstrip": false,
609
+ "normalized": false,
610
+ "rstrip": false,
611
+ "single_word": false,
612
+ "special": true
613
+ },
614
+ "48200": {
615
+ "content": "<extra_id_72>",
616
+ "lstrip": false,
617
+ "normalized": false,
618
+ "rstrip": false,
619
+ "single_word": false,
620
+ "special": true
621
+ },
622
+ "48201": {
623
+ "content": "<extra_id_73>",
624
+ "lstrip": false,
625
+ "normalized": false,
626
+ "rstrip": false,
627
+ "single_word": false,
628
+ "special": true
629
+ },
630
+ "48202": {
631
+ "content": "<extra_id_74>",
632
+ "lstrip": false,
633
+ "normalized": false,
634
+ "rstrip": false,
635
+ "single_word": false,
636
+ "special": true
637
+ },
638
+ "48203": {
639
+ "content": "<extra_id_75>",
640
+ "lstrip": false,
641
+ "normalized": false,
642
+ "rstrip": false,
643
+ "single_word": false,
644
+ "special": true
645
+ },
646
+ "48204": {
647
+ "content": "<extra_id_76>",
648
+ "lstrip": false,
649
+ "normalized": false,
650
+ "rstrip": false,
651
+ "single_word": false,
652
+ "special": true
653
+ },
654
+ "48205": {
655
+ "content": "<extra_id_77>",
656
+ "lstrip": false,
657
+ "normalized": false,
658
+ "rstrip": false,
659
+ "single_word": false,
660
+ "special": true
661
+ },
662
+ "48206": {
663
+ "content": "<extra_id_78>",
664
+ "lstrip": false,
665
+ "normalized": false,
666
+ "rstrip": false,
667
+ "single_word": false,
668
+ "special": true
669
+ },
670
+ "48207": {
671
+ "content": "<extra_id_79>",
672
+ "lstrip": false,
673
+ "normalized": false,
674
+ "rstrip": false,
675
+ "single_word": false,
676
+ "special": true
677
+ },
678
+ "48208": {
679
+ "content": "<extra_id_80>",
680
+ "lstrip": false,
681
+ "normalized": false,
682
+ "rstrip": false,
683
+ "single_word": false,
684
+ "special": true
685
+ },
686
+ "48209": {
687
+ "content": "<extra_id_81>",
688
+ "lstrip": false,
689
+ "normalized": false,
690
+ "rstrip": false,
691
+ "single_word": false,
692
+ "special": true
693
+ },
694
+ "48210": {
695
+ "content": "<extra_id_82>",
696
+ "lstrip": false,
697
+ "normalized": false,
698
+ "rstrip": false,
699
+ "single_word": false,
700
+ "special": true
701
+ },
702
+ "48211": {
703
+ "content": "<extra_id_83>",
704
+ "lstrip": false,
705
+ "normalized": false,
706
+ "rstrip": false,
707
+ "single_word": false,
708
+ "special": true
709
+ },
710
+ "48212": {
711
+ "content": "<extra_id_84>",
712
+ "lstrip": false,
713
+ "normalized": false,
714
+ "rstrip": false,
715
+ "single_word": false,
716
+ "special": true
717
+ },
718
+ "48213": {
719
+ "content": "<extra_id_85>",
720
+ "lstrip": false,
721
+ "normalized": false,
722
+ "rstrip": false,
723
+ "single_word": false,
724
+ "special": true
725
+ },
726
+ "48214": {
727
+ "content": "<extra_id_86>",
728
+ "lstrip": false,
729
+ "normalized": false,
730
+ "rstrip": false,
731
+ "single_word": false,
732
+ "special": true
733
+ },
734
+ "48215": {
735
+ "content": "<extra_id_87>",
736
+ "lstrip": false,
737
+ "normalized": false,
738
+ "rstrip": false,
739
+ "single_word": false,
740
+ "special": true
741
+ },
742
+ "48216": {
743
+ "content": "<extra_id_88>",
744
+ "lstrip": false,
745
+ "normalized": false,
746
+ "rstrip": false,
747
+ "single_word": false,
748
+ "special": true
749
+ },
750
+ "48217": {
751
+ "content": "<extra_id_89>",
752
+ "lstrip": false,
753
+ "normalized": false,
754
+ "rstrip": false,
755
+ "single_word": false,
756
+ "special": true
757
+ },
758
+ "48218": {
759
+ "content": "<extra_id_90>",
760
+ "lstrip": false,
761
+ "normalized": false,
762
+ "rstrip": false,
763
+ "single_word": false,
764
+ "special": true
765
+ },
766
+ "48219": {
767
+ "content": "<extra_id_91>",
768
+ "lstrip": false,
769
+ "normalized": false,
770
+ "rstrip": false,
771
+ "single_word": false,
772
+ "special": true
773
+ },
774
+ "48220": {
775
+ "content": "<extra_id_92>",
776
+ "lstrip": false,
777
+ "normalized": false,
778
+ "rstrip": false,
779
+ "single_word": false,
780
+ "special": true
781
+ },
782
+ "48221": {
783
+ "content": "<extra_id_93>",
784
+ "lstrip": false,
785
+ "normalized": false,
786
+ "rstrip": false,
787
+ "single_word": false,
788
+ "special": true
789
+ },
790
+ "48222": {
791
+ "content": "<extra_id_94>",
792
+ "lstrip": false,
793
+ "normalized": false,
794
+ "rstrip": false,
795
+ "single_word": false,
796
+ "special": true
797
+ },
798
+ "48223": {
799
+ "content": "<extra_id_95>",
800
+ "lstrip": false,
801
+ "normalized": false,
802
+ "rstrip": false,
803
+ "single_word": false,
804
+ "special": true
805
+ },
806
+ "48224": {
807
+ "content": "<extra_id_96>",
808
+ "lstrip": false,
809
+ "normalized": false,
810
+ "rstrip": false,
811
+ "single_word": false,
812
+ "special": true
813
+ },
814
+ "48225": {
815
+ "content": "<extra_id_97>",
816
+ "lstrip": false,
817
+ "normalized": false,
818
+ "rstrip": false,
819
+ "single_word": false,
820
+ "special": true
821
+ },
822
+ "48226": {
823
+ "content": "<extra_id_98>",
824
+ "lstrip": false,
825
+ "normalized": false,
826
+ "rstrip": false,
827
+ "single_word": false,
828
+ "special": true
829
+ },
830
+ "48227": {
831
+ "content": "<extra_id_99>",
832
+ "lstrip": false,
833
+ "normalized": false,
834
+ "rstrip": false,
835
+ "single_word": false,
836
+ "special": true
837
+ }
838
+ },
839
+ "additional_special_tokens": [
840
+ "<extra_id_0>",
841
+ "<extra_id_1>",
842
+ "<extra_id_2>",
843
+ "<extra_id_3>",
844
+ "<extra_id_4>",
845
+ "<extra_id_5>",
846
+ "<extra_id_6>",
847
+ "<extra_id_7>",
848
+ "<extra_id_8>",
849
+ "<extra_id_9>",
850
+ "<extra_id_10>",
851
+ "<extra_id_11>",
852
+ "<extra_id_12>",
853
+ "<extra_id_13>",
854
+ "<extra_id_14>",
855
+ "<extra_id_15>",
856
+ "<extra_id_16>",
857
+ "<extra_id_17>",
858
+ "<extra_id_18>",
859
+ "<extra_id_19>",
860
+ "<extra_id_20>",
861
+ "<extra_id_21>",
862
+ "<extra_id_22>",
863
+ "<extra_id_23>",
864
+ "<extra_id_24>",
865
+ "<extra_id_25>",
866
+ "<extra_id_26>",
867
+ "<extra_id_27>",
868
+ "<extra_id_28>",
869
+ "<extra_id_29>",
870
+ "<extra_id_30>",
871
+ "<extra_id_31>",
872
+ "<extra_id_32>",
873
+ "<extra_id_33>",
874
+ "<extra_id_34>",
875
+ "<extra_id_35>",
876
+ "<extra_id_36>",
877
+ "<extra_id_37>",
878
+ "<extra_id_38>",
879
+ "<extra_id_39>",
880
+ "<extra_id_40>",
881
+ "<extra_id_41>",
882
+ "<extra_id_42>",
883
+ "<extra_id_43>",
884
+ "<extra_id_44>",
885
+ "<extra_id_45>",
886
+ "<extra_id_46>",
887
+ "<extra_id_47>",
888
+ "<extra_id_48>",
889
+ "<extra_id_49>",
890
+ "<extra_id_50>",
891
+ "<extra_id_51>",
892
+ "<extra_id_52>",
893
+ "<extra_id_53>",
894
+ "<extra_id_54>",
895
+ "<extra_id_55>",
896
+ "<extra_id_56>",
897
+ "<extra_id_57>",
898
+ "<extra_id_58>",
899
+ "<extra_id_59>",
900
+ "<extra_id_60>",
901
+ "<extra_id_61>",
902
+ "<extra_id_62>",
903
+ "<extra_id_63>",
904
+ "<extra_id_64>",
905
+ "<extra_id_65>",
906
+ "<extra_id_66>",
907
+ "<extra_id_67>",
908
+ "<extra_id_68>",
909
+ "<extra_id_69>",
910
+ "<extra_id_70>",
911
+ "<extra_id_71>",
912
+ "<extra_id_72>",
913
+ "<extra_id_73>",
914
+ "<extra_id_74>",
915
+ "<extra_id_75>",
916
+ "<extra_id_76>",
917
+ "<extra_id_77>",
918
+ "<extra_id_78>",
919
+ "<extra_id_79>",
920
+ "<extra_id_80>",
921
+ "<extra_id_81>",
922
+ "<extra_id_82>",
923
+ "<extra_id_83>",
924
+ "<extra_id_84>",
925
+ "<extra_id_85>",
926
+ "<extra_id_86>",
927
+ "<extra_id_87>",
928
+ "<extra_id_88>",
929
+ "<extra_id_89>",
930
+ "<extra_id_90>",
931
+ "<extra_id_91>",
932
+ "<extra_id_92>",
933
+ "<extra_id_93>",
934
+ "<extra_id_94>",
935
+ "<extra_id_95>",
936
+ "<extra_id_96>",
937
+ "<extra_id_97>",
938
+ "<extra_id_98>",
939
+ "<extra_id_99>"
940
+ ],
941
+ "bos_token": "<s>",
942
+ "clean_up_tokenization_spaces": false,
943
+ "eos_token": "</s>",
944
+ "legacy": false,
945
+ "model_max_length": 1000000000,
946
+ "pad_token": "<pad>",
947
+ "sp_model_kwargs": {},
948
+ "spaces_between_special_tokens": false,
949
+ "tokenizer_class": "LlamaTokenizer",
950
+ "unk_token": "<unk>",
951
+ "use_default_system_prompt": false
952
+ }
checkpoints/wandb/debug-internal.log CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoints/wandb/debug.log CHANGED
@@ -25,3 +25,12 @@ config: {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False,
25
  2024-08-30 19:59:24,815 INFO MainThread:29052 [wandb_run.py:_redirect():2399] Redirects installed.
26
  2024-08-30 19:59:24,818 INFO MainThread:29052 [wandb_init.py:init():894] run started, returning control to user process
27
  2024-08-30 19:59:44,796 INFO MainThread:29052 [wandb_run.py:_config_callback():1392] config_cb None None {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 34534, 'model': {'klass': 'hf_t5', 'name': 'pszemraj/tFINE-900m-e16-d32', 'overwrite': {'dropout_rate': 0.0}, 'checkpoint_path': '', 'random_init': False, 'compile': True}, 'tokenizer': {'name': 'BEE-spoke-data/slimpajama_tok-48128-BPE-forT5'}, 'data': {'input_length': 1024, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 16, 'before_mask_input_length': 1137, 'target_length': 229}, 'optim': {'name': 'adamwscale', 'base_lr': 0.01, 'batch_size': 128, 'total_steps': 20000, 'epochs': -1, 'warmup_steps': 5000, 'lr_scheduler': 'cosine', 'weight_decay': 0.0001, 'grad_clip': 1.0, 'grad_acc': 8, 'final_cosine': 2e-05}, 'eval': {'every_steps': 1000000000, 'steps': 500, 'corrected_steps': 500}, 'checkpoint': {'every_steps': 2500}, 'logging': {'use_wandb': True, 'wandb_config': {'project': 'nanoT5', 'entity': 'pszemraj', 'tags': ['900m', '1024'], 'mode': 'online'}, 'every_steps': 25, 'grad_l2': True, 'weights_l2': True}, 'slurm_id': 'none', 'working_dir': '/workspace/nanoT5/outputs/2024-08-30/19-59-22', 'n_all_param': 887492096}
 
 
 
 
 
 
 
 
 
 
25
  2024-08-30 19:59:24,815 INFO MainThread:29052 [wandb_run.py:_redirect():2399] Redirects installed.
26
  2024-08-30 19:59:24,818 INFO MainThread:29052 [wandb_init.py:init():894] run started, returning control to user process
27
  2024-08-30 19:59:44,796 INFO MainThread:29052 [wandb_run.py:_config_callback():1392] config_cb None None {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 34534, 'model': {'klass': 'hf_t5', 'name': 'pszemraj/tFINE-900m-e16-d32', 'overwrite': {'dropout_rate': 0.0}, 'checkpoint_path': '', 'random_init': False, 'compile': True}, 'tokenizer': {'name': 'BEE-spoke-data/slimpajama_tok-48128-BPE-forT5'}, 'data': {'input_length': 1024, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 16, 'before_mask_input_length': 1137, 'target_length': 229}, 'optim': {'name': 'adamwscale', 'base_lr': 0.01, 'batch_size': 128, 'total_steps': 20000, 'epochs': -1, 'warmup_steps': 5000, 'lr_scheduler': 'cosine', 'weight_decay': 0.0001, 'grad_clip': 1.0, 'grad_acc': 8, 'final_cosine': 2e-05}, 'eval': {'every_steps': 1000000000, 'steps': 500, 'corrected_steps': 500}, 'checkpoint': {'every_steps': 2500}, 'logging': {'use_wandb': True, 'wandb_config': {'project': 'nanoT5', 'entity': 'pszemraj', 'tags': ['900m', '1024'], 'mode': 'online'}, 'every_steps': 25, 'grad_l2': True, 'weights_l2': True}, 'slurm_id': 'none', 'working_dir': '/workspace/nanoT5/outputs/2024-08-30/19-59-22', 'n_all_param': 887492096}
28
+ 2024-08-31 23:32:00,793 INFO MainThread:29052 [wandb_run.py:_finish():2160] finishing run pszemraj/nanoT5/mao0tqjy
29
+ 2024-08-31 23:32:00,796 INFO MainThread:29052 [wandb_run.py:_atexit_cleanup():2424] got exitcode: 0
30
+ 2024-08-31 23:32:00,797 INFO MainThread:29052 [wandb_run.py:_restore():2406] restore
31
+ 2024-08-31 23:32:00,798 INFO MainThread:29052 [wandb_run.py:_restore():2412] restore done
32
+ 2024-08-31 23:32:00,799 INFO MainThread:29052 [wandb_run.py:_on_finish():2677] communicating current version
33
+ 2024-08-31 23:32:00,827 INFO MainThread:29052 [wandb_run.py:_on_finish():2686] got version response
34
+ 2024-08-31 23:32:06,426 INFO MainThread:29052 [wandb_run.py:_footer_history_summary_info():4078] rendering history
35
+ 2024-08-31 23:32:06,427 INFO MainThread:29052 [wandb_run.py:_footer_history_summary_info():4110] rendering summary
36
+ 2024-08-31 23:32:06,433 INFO MainThread:29052 [wandb_run.py:_footer_sync_info():4037] logging synced files
checkpoints/wandb/run-20240830_195924-mao0tqjy/files/config.yaml CHANGED
@@ -117,6 +117,7 @@ _wandb:
117
  - 71
118
  - 100
119
  3:
 
120
  - 15
121
  - 16
122
  - 23
 
117
  - 71
118
  - 100
119
  3:
120
+ - 2
121
  - 15
122
  - 16
123
  - 23
checkpoints/wandb/run-20240830_195924-mao0tqjy/files/output.log CHANGED
@@ -747,3 +747,209 @@ W0830 20:03:56.180000 136239311107648 torch/fx/experimental/symbolic_shapes.py:4
747
  [2024-08-31 17:05:56,534][Main][INFO] - [train] Step 15475 out of 20000 | Loss --> 1.777 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.386 | Lr --> 0.002 | Seconds_per_step --> 4.936 |
748
  [2024-08-31 17:07:58,425][Main][INFO] - [train] Step 15500 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.378 | Lr --> 0.002 | Seconds_per_step --> 4.876 |
749
  [2024-08-31 17:10:00,358][Main][INFO] - [train] Step 15525 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.364 | Lr --> 0.002 | Seconds_per_step --> 4.877 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
747
  [2024-08-31 17:05:56,534][Main][INFO] - [train] Step 15475 out of 20000 | Loss --> 1.777 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.386 | Lr --> 0.002 | Seconds_per_step --> 4.936 |
748
  [2024-08-31 17:07:58,425][Main][INFO] - [train] Step 15500 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.378 | Lr --> 0.002 | Seconds_per_step --> 4.876 |
749
  [2024-08-31 17:10:00,358][Main][INFO] - [train] Step 15525 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.364 | Lr --> 0.002 | Seconds_per_step --> 4.877 |
750
+ [2024-08-31 17:12:03,531][Main][INFO] - [train] Step 15550 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.349 | Lr --> 0.002 | Seconds_per_step --> 4.927 |
751
+ [2024-08-31 17:14:05,478][Main][INFO] - [train] Step 15575 out of 20000 | Loss --> 1.778 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.340 | Lr --> 0.002 | Seconds_per_step --> 4.878 |
752
+ [2024-08-31 17:16:07,219][Main][INFO] - [train] Step 15600 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.323 | Lr --> 0.002 | Seconds_per_step --> 4.870 |
753
+ [2024-08-31 17:18:10,558][Main][INFO] - [train] Step 15625 out of 20000 | Loss --> 1.772 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.311 | Lr --> 0.002 | Seconds_per_step --> 4.933 |
754
+ [2024-08-31 17:20:12,172][Main][INFO] - [train] Step 15650 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.303 | Lr --> 0.002 | Seconds_per_step --> 4.864 |
755
+ [2024-08-31 17:22:13,878][Main][INFO] - [train] Step 15675 out of 20000 | Loss --> 1.767 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.290 | Lr --> 0.002 | Seconds_per_step --> 4.868 |
756
+ [2024-08-31 17:24:17,279][Main][INFO] - [train] Step 15700 out of 20000 | Loss --> 1.773 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.273 | Lr --> 0.002 | Seconds_per_step --> 4.936 |
757
+ [2024-08-31 17:26:19,126][Main][INFO] - [train] Step 15725 out of 20000 | Loss --> 1.780 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.263 | Lr --> 0.002 | Seconds_per_step --> 4.874 |
758
+ [2024-08-31 17:28:20,908][Main][INFO] - [train] Step 15750 out of 20000 | Loss --> 1.782 | Grad_l2 --> 0.200 | Weights_l2 --> 11272.246 | Lr --> 0.002 | Seconds_per_step --> 4.871 |
759
+ [2024-08-31 17:30:24,238][Main][INFO] - [train] Step 15775 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.233 | Lr --> 0.002 | Seconds_per_step --> 4.933 |
760
+ [2024-08-31 17:32:26,169][Main][INFO] - [train] Step 15800 out of 20000 | Loss --> 1.760 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.220 | Lr --> 0.002 | Seconds_per_step --> 4.877 |
761
+ [2024-08-31 17:34:28,677][Main][INFO] - [train] Step 15825 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.207 | Lr --> 0.002 | Seconds_per_step --> 4.900 |
762
+ [2024-08-31 17:36:32,092][Main][INFO] - [train] Step 15850 out of 20000 | Loss --> 1.781 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.196 | Lr --> 0.002 | Seconds_per_step --> 4.936 |
763
+ [2024-08-31 17:38:33,746][Main][INFO] - [train] Step 15875 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.177 | Lr --> 0.002 | Seconds_per_step --> 4.866 |
764
+ [2024-08-31 17:40:35,415][Main][INFO] - [train] Step 15900 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.165 | Lr --> 0.002 | Seconds_per_step --> 4.867 |
765
+ [2024-08-31 17:42:36,852][Main][INFO] - [train] Step 15925 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.154 | Lr --> 0.002 | Seconds_per_step --> 4.857 |
766
+ [2024-08-31 17:44:40,125][Main][INFO] - [train] Step 15950 out of 20000 | Loss --> 1.774 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.141 | Lr --> 0.002 | Seconds_per_step --> 4.931 |
767
+ [2024-08-31 17:46:42,051][Main][INFO] - [train] Step 15975 out of 20000 | Loss --> 1.761 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.121 | Lr --> 0.002 | Seconds_per_step --> 4.877 |
768
+ [2024-08-31 17:48:43,831][Main][INFO] - [train] Step 16000 out of 20000 | Loss --> 1.775 | Grad_l2 --> 0.196 | Weights_l2 --> 11272.110 | Lr --> 0.002 | Seconds_per_step --> 4.871 |
769
+ [2024-08-31 17:50:47,287][Main][INFO] - [train] Step 16025 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.197 | Weights_l2 --> 11272.098 | Lr --> 0.002 | Seconds_per_step --> 4.938 |
770
+ [2024-08-31 17:52:48,992][Main][INFO] - [train] Step 16050 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.202 | Weights_l2 --> 11272.084 | Lr --> 0.002 | Seconds_per_step --> 4.868 |
771
+ [2024-08-31 17:54:50,724][Main][INFO] - [train] Step 16075 out of 20000 | Loss --> 1.767 | Grad_l2 --> 0.199 | Weights_l2 --> 11272.068 | Lr --> 0.002 | Seconds_per_step --> 4.869 |
772
+ [2024-08-31 17:56:54,016][Main][INFO] - [train] Step 16100 out of 20000 | Loss --> 1.777 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.054 | Lr --> 0.002 | Seconds_per_step --> 4.932 |
773
+ [2024-08-31 17:58:55,942][Main][INFO] - [train] Step 16125 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.037 | Lr --> 0.002 | Seconds_per_step --> 4.877 |
774
+ [2024-08-31 18:00:57,478][Main][INFO] - [train] Step 16150 out of 20000 | Loss --> 1.766 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.020 | Lr --> 0.002 | Seconds_per_step --> 4.861 |
775
+ [2024-08-31 18:03:00,736][Main][INFO] - [train] Step 16175 out of 20000 | Loss --> 1.767 | Grad_l2 --> 0.198 | Weights_l2 --> 11272.000 | Lr --> 0.002 | Seconds_per_step --> 4.930 |
776
+ [2024-08-31 18:05:02,586][Main][INFO] - [train] Step 16200 out of 20000 | Loss --> 1.760 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.988 | Lr --> 0.002 | Seconds_per_step --> 4.874 |
777
+ [2024-08-31 18:07:04,438][Main][INFO] - [train] Step 16225 out of 20000 | Loss --> 1.758 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.975 | Lr --> 0.002 | Seconds_per_step --> 4.874 |
778
+ [2024-08-31 18:09:07,901][Main][INFO] - [train] Step 16250 out of 20000 | Loss --> 1.763 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.962 | Lr --> 0.001 | Seconds_per_step --> 4.938 |
779
+ [2024-08-31 18:11:09,649][Main][INFO] - [train] Step 16275 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.952 | Lr --> 0.001 | Seconds_per_step --> 4.870 |
780
+ [2024-08-31 18:13:11,767][Main][INFO] - [train] Step 16300 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.935 | Lr --> 0.001 | Seconds_per_step --> 4.885 |
781
+ [2024-08-31 18:15:13,819][Main][INFO] - [train] Step 16325 out of 20000 | Loss --> 1.765 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.928 | Lr --> 0.001 | Seconds_per_step --> 4.882 |
782
+ [2024-08-31 18:17:17,096][Main][INFO] - [train] Step 16350 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.916 | Lr --> 0.001 | Seconds_per_step --> 4.931 |
783
+ [2024-08-31 18:19:18,879][Main][INFO] - [train] Step 16375 out of 20000 | Loss --> 1.757 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.902 | Lr --> 0.001 | Seconds_per_step --> 4.871 |
784
+ [2024-08-31 18:21:20,566][Main][INFO] - [train] Step 16400 out of 20000 | Loss --> 1.770 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.886 | Lr --> 0.001 | Seconds_per_step --> 4.867 |
785
+ [2024-08-31 18:23:24,550][Main][INFO] - [train] Step 16425 out of 20000 | Loss --> 1.752 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.874 | Lr --> 0.001 | Seconds_per_step --> 4.959 |
786
+ [2024-08-31 18:25:26,461][Main][INFO] - [train] Step 16450 out of 20000 | Loss --> 1.756 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.860 | Lr --> 0.001 | Seconds_per_step --> 4.876 |
787
+ [2024-08-31 18:27:28,278][Main][INFO] - [train] Step 16475 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.846 | Lr --> 0.001 | Seconds_per_step --> 4.873 |
788
+ [2024-08-31 18:29:32,921][Main][INFO] - [train] Step 16500 out of 20000 | Loss --> 1.753 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.834 | Lr --> 0.001 | Seconds_per_step --> 4.986 |
789
+ [2024-08-31 18:31:35,149][Main][INFO] - [train] Step 16525 out of 20000 | Loss --> 1.760 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.819 | Lr --> 0.001 | Seconds_per_step --> 4.889 |
790
+ [2024-08-31 18:33:37,364][Main][INFO] - [train] Step 16550 out of 20000 | Loss --> 1.754 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.799 | Lr --> 0.001 | Seconds_per_step --> 4.889 |
791
+ [2024-08-31 18:35:40,915][Main][INFO] - [train] Step 16575 out of 20000 | Loss --> 1.751 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.786 | Lr --> 0.001 | Seconds_per_step --> 4.942 |
792
+ [2024-08-31 18:37:43,466][Main][INFO] - [train] Step 16600 out of 20000 | Loss --> 1.768 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.770 | Lr --> 0.001 | Seconds_per_step --> 4.902 |
793
+ [2024-08-31 18:39:45,637][Main][INFO] - [train] Step 16625 out of 20000 | Loss --> 1.749 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.754 | Lr --> 0.001 | Seconds_per_step --> 4.887 |
794
+ [2024-08-31 18:41:49,405][Main][INFO] - [train] Step 16650 out of 20000 | Loss --> 1.748 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.739 | Lr --> 0.001 | Seconds_per_step --> 4.951 |
795
+ [2024-08-31 18:43:51,102][Main][INFO] - [train] Step 16675 out of 20000 | Loss --> 1.745 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.723 | Lr --> 0.001 | Seconds_per_step --> 4.868 |
796
+ [2024-08-31 18:45:52,944][Main][INFO] - [train] Step 16700 out of 20000 | Loss --> 1.736 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.708 | Lr --> 0.001 | Seconds_per_step --> 4.874 |
797
+ [2024-08-31 18:47:56,099][Main][INFO] - [train] Step 16725 out of 20000 | Loss --> 1.757 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.698 | Lr --> 0.001 | Seconds_per_step --> 4.926 |
798
+ [2024-08-31 18:49:57,801][Main][INFO] - [train] Step 16750 out of 20000 | Loss --> 1.742 | Grad_l2 --> 0.195 | Weights_l2 --> 11271.684 | Lr --> 0.001 | Seconds_per_step --> 4.868 |
799
+ [2024-08-31 18:51:59,437][Main][INFO] - [train] Step 16775 out of 20000 | Loss --> 1.755 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.665 | Lr --> 0.001 | Seconds_per_step --> 4.865 |
800
+ [2024-08-31 18:54:00,855][Main][INFO] - [train] Step 16800 out of 20000 | Loss --> 1.747 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.651 | Lr --> 0.001 | Seconds_per_step --> 4.857 |
801
+ [2024-08-31 18:56:03,853][Main][INFO] - [train] Step 16825 out of 20000 | Loss --> 1.743 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.632 | Lr --> 0.001 | Seconds_per_step --> 4.920 |
802
+ [2024-08-31 18:58:05,201][Main][INFO] - [train] Step 16850 out of 20000 | Loss --> 1.745 | Grad_l2 --> 0.201 | Weights_l2 --> 11271.619 | Lr --> 0.001 | Seconds_per_step --> 4.854 |
803
+ [2024-08-31 19:00:06,563][Main][INFO] - [train] Step 16875 out of 20000 | Loss --> 1.753 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.606 | Lr --> 0.001 | Seconds_per_step --> 4.854 |
804
+ [2024-08-31 19:02:09,317][Main][INFO] - [train] Step 16900 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.591 | Lr --> 0.001 | Seconds_per_step --> 4.910 |
805
+ [2024-08-31 19:04:10,472][Main][INFO] - [train] Step 16925 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.575 | Lr --> 0.001 | Seconds_per_step --> 4.846 |
806
+ [2024-08-31 19:06:11,583][Main][INFO] - [train] Step 16950 out of 20000 | Loss --> 1.741 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.559 | Lr --> 0.001 | Seconds_per_step --> 4.844 |
807
+ [2024-08-31 19:08:14,363][Main][INFO] - [train] Step 16975 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.545 | Lr --> 0.001 | Seconds_per_step --> 4.911 |
808
+ [2024-08-31 19:10:15,178][Main][INFO] - [train] Step 17000 out of 20000 | Loss --> 1.738 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.533 | Lr --> 0.001 | Seconds_per_step --> 4.832 |
809
+ [2024-08-31 19:12:16,110][Main][INFO] - [train] Step 17025 out of 20000 | Loss --> 1.741 | Grad_l2 --> 0.200 | Weights_l2 --> 11271.515 | Lr --> 0.001 | Seconds_per_step --> 4.837 |
810
+ [2024-08-31 19:14:18,572][Main][INFO] - [train] Step 17050 out of 20000 | Loss --> 1.740 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.500 | Lr --> 0.001 | Seconds_per_step --> 4.898 |
811
+ [2024-08-31 19:16:19,446][Main][INFO] - [train] Step 17075 out of 20000 | Loss --> 1.735 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.485 | Lr --> 0.001 | Seconds_per_step --> 4.835 |
812
+ [2024-08-31 19:18:20,449][Main][INFO] - [train] Step 17100 out of 20000 | Loss --> 1.734 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.471 | Lr --> 0.001 | Seconds_per_step --> 4.840 |
813
+ [2024-08-31 19:20:23,025][Main][INFO] - [train] Step 17125 out of 20000 | Loss --> 1.736 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.453 | Lr --> 0.001 | Seconds_per_step --> 4.903 |
814
+ [2024-08-31 19:22:23,946][Main][INFO] - [train] Step 17150 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.439 | Lr --> 0.001 | Seconds_per_step --> 4.837 |
815
+ [2024-08-31 19:24:25,361][Main][INFO] - [train] Step 17175 out of 20000 | Loss --> 1.732 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.422 | Lr --> 0.001 | Seconds_per_step --> 4.857 |
816
+ [2024-08-31 19:26:26,446][Main][INFO] - [train] Step 17200 out of 20000 | Loss --> 1.734 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.408 | Lr --> 0.001 | Seconds_per_step --> 4.843 |
817
+ [2024-08-31 19:28:29,313][Main][INFO] - [train] Step 17225 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.395 | Lr --> 0.001 | Seconds_per_step --> 4.915 |
818
+ [2024-08-31 19:30:30,859][Main][INFO] - [train] Step 17250 out of 20000 | Loss --> 1.737 | Grad_l2 --> 0.195 | Weights_l2 --> 11271.382 | Lr --> 0.001 | Seconds_per_step --> 4.862 |
819
+ [2024-08-31 19:32:32,558][Main][INFO] - [train] Step 17275 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.367 | Lr --> 0.001 | Seconds_per_step --> 4.868 |
820
+ [2024-08-31 19:34:35,555][Main][INFO] - [train] Step 17300 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.349 | Lr --> 0.001 | Seconds_per_step --> 4.920 |
821
+ [2024-08-31 19:36:37,051][Main][INFO] - [train] Step 17325 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.333 | Lr --> 0.001 | Seconds_per_step --> 4.860 |
822
+ [2024-08-31 19:38:38,163][Main][INFO] - [train] Step 17350 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.315 | Lr --> 0.001 | Seconds_per_step --> 4.844 |
823
+ [2024-08-31 19:40:41,121][Main][INFO] - [train] Step 17375 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.299 | Lr --> 0.001 | Seconds_per_step --> 4.918 |
824
+ [2024-08-31 19:42:42,654][Main][INFO] - [train] Step 17400 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.201 | Weights_l2 --> 11271.281 | Lr --> 0.001 | Seconds_per_step --> 4.861 |
825
+ [2024-08-31 19:44:44,126][Main][INFO] - [train] Step 17425 out of 20000 | Loss --> 1.713 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.264 | Lr --> 0.001 | Seconds_per_step --> 4.859 |
826
+ [2024-08-31 19:46:47,266][Main][INFO] - [train] Step 17450 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.200 | Weights_l2 --> 11271.245 | Lr --> 0.001 | Seconds_per_step --> 4.925 |
827
+ [2024-08-31 19:48:48,692][Main][INFO] - [train] Step 17475 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.227 | Lr --> 0.001 | Seconds_per_step --> 4.857 |
828
+ [2024-08-31 19:50:50,523][Main][INFO] - [train] Step 17500 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.212 | Lr --> 0.001 | Seconds_per_step --> 4.873 |
829
+ [2024-08-31 19:50:50,523][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-17500
830
+ [2024-08-31 19:50:50,530][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
831
+ [2024-08-31 19:50:57,172][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-17500/model.safetensors
832
+ [2024-08-31 19:51:06,254][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-17500/optimizer.bin
833
+ [2024-08-31 19:51:06,257][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-17500/scheduler.bin
834
+ [2024-08-31 19:51:06,258][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-17500/sampler.bin
835
+ [2024-08-31 19:51:06,260][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-17500/sampler_1.bin
836
+ [2024-08-31 19:51:06,261][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-17500/random_states_0.pkl
837
+ [2024-08-31 19:53:08,757][Main][INFO] - [train] Step 17525 out of 20000 | Loss --> 1.716 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.193 | Lr --> 0.001 | Seconds_per_step --> 5.529 |
838
+ [2024-08-31 19:55:09,923][Main][INFO] - [train] Step 17550 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.173 | Lr --> 0.001 | Seconds_per_step --> 4.847 |
839
+ [2024-08-31 19:57:11,243][Main][INFO] - [train] Step 17575 out of 20000 | Loss --> 1.700 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.155 | Lr --> 0.001 | Seconds_per_step --> 4.853 |
840
+ [2024-08-31 19:59:14,099][Main][INFO] - [train] Step 17600 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.137 | Lr --> 0.001 | Seconds_per_step --> 4.914 |
841
+ [2024-08-31 20:01:15,562][Main][INFO] - [train] Step 17625 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11271.118 | Lr --> 0.001 | Seconds_per_step --> 4.858 |
842
+ [2024-08-31 20:03:16,470][Main][INFO] - [train] Step 17650 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.199 | Weights_l2 --> 11271.097 | Lr --> 0.001 | Seconds_per_step --> 4.836 |
843
+ [2024-08-31 20:05:17,916][Main][INFO] - [train] Step 17675 out of 20000 | Loss --> 1.733 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.078 | Lr --> 0.001 | Seconds_per_step --> 4.858 |
844
+ [2024-08-31 20:07:20,683][Main][INFO] - [train] Step 17700 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.196 | Weights_l2 --> 11271.062 | Lr --> 0.001 | Seconds_per_step --> 4.911 |
845
+ [2024-08-31 20:09:22,414][Main][INFO] - [train] Step 17725 out of 20000 | Loss --> 1.707 | Grad_l2 --> 0.195 | Weights_l2 --> 11271.041 | Lr --> 0.001 | Seconds_per_step --> 4.869 |
846
+ [2024-08-31 20:11:24,033][Main][INFO] - [train] Step 17750 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.024 | Lr --> 0.001 | Seconds_per_step --> 4.865 |
847
+ [2024-08-31 20:13:26,602][Main][INFO] - [train] Step 17775 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.198 | Weights_l2 --> 11271.007 | Lr --> 0.001 | Seconds_per_step --> 4.903 |
848
+ [2024-08-31 20:15:27,607][Main][INFO] - [train] Step 17800 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.991 | Lr --> 0.001 | Seconds_per_step --> 4.840 |
849
+ [2024-08-31 20:17:28,616][Main][INFO] - [train] Step 17825 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.976 | Lr --> 0.001 | Seconds_per_step --> 4.840 |
850
+ [2024-08-31 20:19:31,033][Main][INFO] - [train] Step 17850 out of 20000 | Loss --> 1.729 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.961 | Lr --> 0.001 | Seconds_per_step --> 4.897 |
851
+ [2024-08-31 20:21:32,133][Main][INFO] - [train] Step 17875 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.946 | Lr --> 0.001 | Seconds_per_step --> 4.844 |
852
+ [2024-08-31 20:23:33,151][Main][INFO] - [train] Step 17900 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.933 | Lr --> 0.000 | Seconds_per_step --> 4.841 |
853
+ [2024-08-31 20:25:35,828][Main][INFO] - [train] Step 17925 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.921 | Lr --> 0.000 | Seconds_per_step --> 4.907 |
854
+ [2024-08-31 20:27:36,892][Main][INFO] - [train] Step 17950 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.911 | Lr --> 0.000 | Seconds_per_step --> 4.842 |
855
+ [2024-08-31 20:29:38,066][Main][INFO] - [train] Step 17975 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.900 | Lr --> 0.000 | Seconds_per_step --> 4.847 |
856
+ [2024-08-31 20:31:40,569][Main][INFO] - [train] Step 18000 out of 20000 | Loss --> 1.738 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.891 | Lr --> 0.000 | Seconds_per_step --> 4.900 |
857
+ [2024-08-31 20:33:41,408][Main][INFO] - [train] Step 18025 out of 20000 | Loss --> 1.739 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.884 | Lr --> 0.000 | Seconds_per_step --> 4.833 |
858
+ [2024-08-31 20:35:42,352][Main][INFO] - [train] Step 18050 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.876 | Lr --> 0.000 | Seconds_per_step --> 4.838 |
859
+ [2024-08-31 20:37:45,322][Main][INFO] - [train] Step 18075 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.869 | Lr --> 0.000 | Seconds_per_step --> 4.919 |
860
+ [2024-08-31 20:39:46,981][Main][INFO] - [train] Step 18100 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.862 | Lr --> 0.000 | Seconds_per_step --> 4.866 |
861
+ [2024-08-31 20:41:48,584][Main][INFO] - [train] Step 18125 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.855 | Lr --> 0.000 | Seconds_per_step --> 4.864 |
862
+ [2024-08-31 20:43:49,907][Main][INFO] - [train] Step 18150 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.853 |
863
+ [2024-08-31 20:45:52,968][Main][INFO] - [train] Step 18175 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.844 | Lr --> 0.000 | Seconds_per_step --> 4.922 |
864
+ [2024-08-31 20:47:54,325][Main][INFO] - [train] Step 18200 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.854 |
865
+ [2024-08-31 20:49:55,663][Main][INFO] - [train] Step 18225 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.195 | Weights_l2 --> 11270.840 | Lr --> 0.000 | Seconds_per_step --> 4.853 |
866
+ [2024-08-31 20:51:58,657][Main][INFO] - [train] Step 18250 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.836 | Lr --> 0.000 | Seconds_per_step --> 4.920 |
867
+ [2024-08-31 20:54:00,083][Main][INFO] - [train] Step 18275 out of 20000 | Loss --> 1.721 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.834 | Lr --> 0.000 | Seconds_per_step --> 4.857 |
868
+ [2024-08-31 20:56:01,850][Main][INFO] - [train] Step 18300 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.831 | Lr --> 0.000 | Seconds_per_step --> 4.871 |
869
+ [2024-08-31 20:58:04,690][Main][INFO] - [train] Step 18325 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.832 | Lr --> 0.000 | Seconds_per_step --> 4.913 |
870
+ [2024-08-31 21:00:06,226][Main][INFO] - [train] Step 18350 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.832 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
871
+ [2024-08-31 21:02:07,970][Main][INFO] - [train] Step 18375 out of 20000 | Loss --> 1.728 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.831 | Lr --> 0.000 | Seconds_per_step --> 4.870 |
872
+ [2024-08-31 21:04:11,035][Main][INFO] - [train] Step 18400 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.831 | Lr --> 0.000 | Seconds_per_step --> 4.922 |
873
+ [2024-08-31 21:06:12,731][Main][INFO] - [train] Step 18425 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.834 | Lr --> 0.000 | Seconds_per_step --> 4.868 |
874
+ [2024-08-31 21:08:14,292][Main][INFO] - [train] Step 18450 out of 20000 | Loss --> 1.721 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.836 | Lr --> 0.000 | Seconds_per_step --> 4.862 |
875
+ [2024-08-31 21:10:17,481][Main][INFO] - [train] Step 18475 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.837 | Lr --> 0.000 | Seconds_per_step --> 4.927 |
876
+ [2024-08-31 21:12:19,115][Main][INFO] - [train] Step 18500 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.838 | Lr --> 0.000 | Seconds_per_step --> 4.865 |
877
+ [2024-08-31 21:14:20,604][Main][INFO] - [train] Step 18525 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.839 | Lr --> 0.000 | Seconds_per_step --> 4.859 |
878
+ [2024-08-31 21:16:24,832][Main][INFO] - [train] Step 18550 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.969 |
879
+ [2024-08-31 21:18:26,217][Main][INFO] - [train] Step 18575 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.855 |
880
+ [2024-08-31 21:20:27,684][Main][INFO] - [train] Step 18600 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.840 | Lr --> 0.000 | Seconds_per_step --> 4.859 |
881
+ [2024-08-31 21:22:29,530][Main][INFO] - [train] Step 18625 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.874 |
882
+ [2024-08-31 21:24:33,115][Main][INFO] - [train] Step 18650 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.195 | Weights_l2 --> 11270.841 | Lr --> 0.000 | Seconds_per_step --> 4.943 |
883
+ [2024-08-31 21:26:35,121][Main][INFO] - [train] Step 18675 out of 20000 | Loss --> 1.732 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.842 | Lr --> 0.000 | Seconds_per_step --> 4.880 |
884
+ [2024-08-31 21:28:37,128][Main][INFO] - [train] Step 18700 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.843 | Lr --> 0.000 | Seconds_per_step --> 4.880 |
885
+ [2024-08-31 21:30:40,396][Main][INFO] - [train] Step 18725 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.843 | Lr --> 0.000 | Seconds_per_step --> 4.931 |
886
+ [2024-08-31 21:33:03,172][Main][INFO] - [train] Step 18750 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.844 | Lr --> 0.000 | Seconds_per_step --> 5.711 |
887
+ [2024-08-31 21:35:04,721][Main][INFO] - [train] Step 18775 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.845 | Lr --> 0.000 | Seconds_per_step --> 4.862 |
888
+ [2024-08-31 21:37:08,128][Main][INFO] - [train] Step 18800 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.845 | Lr --> 0.000 | Seconds_per_step --> 4.936 |
889
+ [2024-08-31 21:39:09,857][Main][INFO] - [train] Step 18825 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.201 | Weights_l2 --> 11270.845 | Lr --> 0.000 | Seconds_per_step --> 4.869 |
890
+ [2024-08-31 21:41:11,700][Main][INFO] - [train] Step 18850 out of 20000 | Loss --> 1.725 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.847 | Lr --> 0.000 | Seconds_per_step --> 4.874 |
891
+ [2024-08-31 21:43:15,117][Main][INFO] - [train] Step 18875 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.846 | Lr --> 0.000 | Seconds_per_step --> 4.937 |
892
+ [2024-08-31 21:45:19,433][Main][INFO] - [train] Step 18900 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.847 | Lr --> 0.000 | Seconds_per_step --> 4.973 |
893
+ [2024-08-31 21:47:29,032][Main][INFO] - [train] Step 18925 out of 20000 | Loss --> 1.709 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.847 | Lr --> 0.000 | Seconds_per_step --> 5.184 |
894
+ [2024-08-31 21:49:33,512][Main][INFO] - [train] Step 18950 out of 20000 | Loss --> 1.731 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.848 | Lr --> 0.000 | Seconds_per_step --> 4.979 |
895
+ [2024-08-31 21:51:35,196][Main][INFO] - [train] Step 18975 out of 20000 | Loss --> 1.721 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.867 |
896
+ [2024-08-31 21:53:36,788][Main][INFO] - [train] Step 19000 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.864 |
897
+ [2024-08-31 21:55:38,313][Main][INFO] - [train] Step 19025 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
898
+ [2024-08-31 21:57:41,329][Main][INFO] - [train] Step 19050 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.921 |
899
+ [2024-08-31 21:59:42,853][Main][INFO] - [train] Step 19075 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
900
+ [2024-08-31 22:01:44,492][Main][INFO] - [train] Step 19100 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.865 |
901
+ [2024-08-31 22:03:47,660][Main][INFO] - [train] Step 19125 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.927 |
902
+ [2024-08-31 22:05:49,133][Main][INFO] - [train] Step 19150 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.859 |
903
+ [2024-08-31 22:07:50,623][Main][INFO] - [train] Step 19175 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.849 | Lr --> 0.000 | Seconds_per_step --> 4.860 |
904
+ [2024-08-31 22:09:53,873][Main][INFO] - [train] Step 19200 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.930 |
905
+ [2024-08-31 22:11:55,529][Main][INFO] - [train] Step 19225 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.866 |
906
+ [2024-08-31 22:13:57,272][Main][INFO] - [train] Step 19250 out of 20000 | Loss --> 1.730 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.870 |
907
+ [2024-08-31 22:16:01,229][Main][INFO] - [train] Step 19275 out of 20000 | Loss --> 1.722 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.958 |
908
+ [2024-08-31 22:18:03,766][Main][INFO] - [train] Step 19300 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.901 |
909
+ [2024-08-31 22:20:06,053][Main][INFO] - [train] Step 19325 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.891 |
910
+ [2024-08-31 22:22:09,832][Main][INFO] - [train] Step 19350 out of 20000 | Loss --> 1.726 | Grad_l2 --> 0.200 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.951 |
911
+ [2024-08-31 22:24:11,539][Main][INFO] - [train] Step 19375 out of 20000 | Loss --> 1.707 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.850 | Lr --> 0.000 | Seconds_per_step --> 4.868 |
912
+ [2024-08-31 22:26:13,254][Main][INFO] - [train] Step 19400 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.869 |
913
+ [2024-08-31 22:28:15,089][Main][INFO] - [train] Step 19425 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.851 | Lr --> 0.000 | Seconds_per_step --> 4.873 |
914
+ [2024-08-31 22:30:18,617][Main][INFO] - [train] Step 19450 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.941 |
915
+ [2024-08-31 22:32:20,460][Main][INFO] - [train] Step 19475 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.874 |
916
+ [2024-08-31 22:34:21,991][Main][INFO] - [train] Step 19500 out of 20000 | Loss --> 1.723 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
917
+ [2024-08-31 22:36:24,959][Main][INFO] - [train] Step 19525 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.919 |
918
+ [2024-08-31 22:38:26,479][Main][INFO] - [train] Step 19550 out of 20000 | Loss --> 1.727 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
919
+ [2024-08-31 22:40:28,432][Main][INFO] - [train] Step 19575 out of 20000 | Loss --> 1.714 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.878 |
920
+ [2024-08-31 22:42:31,962][Main][INFO] - [train] Step 19600 out of 20000 | Loss --> 1.718 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.941 |
921
+ [2024-08-31 22:44:33,970][Main][INFO] - [train] Step 19625 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.880 |
922
+ [2024-08-31 22:46:38,990][Main][INFO] - [train] Step 19650 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 5.001 |
923
+ [2024-08-31 22:48:42,541][Main][INFO] - [train] Step 19675 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.942 |
924
+ [2024-08-31 22:50:48,499][Main][INFO] - [train] Step 19700 out of 20000 | Loss --> 1.724 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 5.038 |
925
+ [2024-08-31 22:52:50,152][Main][INFO] - [train] Step 19725 out of 20000 | Loss --> 1.717 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.866 |
926
+ [2024-08-31 22:54:53,252][Main][INFO] - [train] Step 19750 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.924 |
927
+ [2024-08-31 22:56:55,118][Main][INFO] - [train] Step 19775 out of 20000 | Loss --> 1.712 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.875 |
928
+ [2024-08-31 22:58:56,863][Main][INFO] - [train] Step 19800 out of 20000 | Loss --> 1.715 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.870 |
929
+ [2024-08-31 23:01:01,950][Main][INFO] - [train] Step 19825 out of 20000 | Loss --> 1.710 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 5.003 |
930
+ [2024-08-31 23:03:04,913][Main][INFO] - [train] Step 19850 out of 20000 | Loss --> 1.713 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.852 | Lr --> 0.000 | Seconds_per_step --> 4.918 |
931
+ [2024-08-31 23:05:06,946][Main][INFO] - [train] Step 19875 out of 20000 | Loss --> 1.710 | Grad_l2 --> 0.196 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.881 |
932
+ [2024-08-31 23:07:08,902][Main][INFO] - [train] Step 19900 out of 20000 | Loss --> 1.711 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.878 |
933
+ [2024-08-31 23:09:12,065][Main][INFO] - [train] Step 19925 out of 20000 | Loss --> 1.716 | Grad_l2 --> 0.197 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.926 |
934
+ [2024-08-31 23:11:13,586][Main][INFO] - [train] Step 19950 out of 20000 | Loss --> 1.719 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.861 |
935
+ [2024-08-31 23:13:15,435][Main][INFO] - [train] Step 19975 out of 20000 | Loss --> 1.720 | Grad_l2 --> 0.198 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.874 |
936
+ [2024-08-31 23:15:18,590][Main][INFO] - [train] Step 20000 out of 20000 | Loss --> 1.716 | Grad_l2 --> 0.199 | Weights_l2 --> 11270.853 | Lr --> 0.000 | Seconds_per_step --> 4.926 |
937
+ [2024-08-31 23:15:18,591][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20000
938
+ [2024-08-31 23:15:18,599][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
939
+ [2024-08-31 23:15:26,324][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20000/model.safetensors
940
+ [2024-08-31 23:15:35,439][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20000/optimizer.bin
941
+ [2024-08-31 23:15:35,440][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20000/scheduler.bin
942
+ [2024-08-31 23:15:35,441][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20000/sampler.bin
943
+ [2024-08-31 23:15:35,441][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20000/sampler_1.bin
944
+ [2024-08-31 23:15:35,442][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20000/random_states_0.pkl
945
+ [2024-08-31 23:31:42,282][Main][INFO] - [eval] Step 20001 out of 20000 | Loss --> 2.073 | Accuracy --> 0.604 | Time --> 964.275 |
946
+ [2024-08-31 23:31:42,287][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20001
947
+ [2024-08-31 23:31:42,295][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
948
+ [2024-08-31 23:31:50,975][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20001/model.safetensors
949
+ [2024-08-31 23:32:00,717][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20001/optimizer.bin
950
+ [2024-08-31 23:32:00,719][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20001/scheduler.bin
951
+ [2024-08-31 23:32:00,720][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20001/sampler.bin
952
+ [2024-08-31 23:32:00,720][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20001/sampler_1.bin
953
+ [2024-08-31 23:32:00,721][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20001/random_states_0.pkl
954
+ tokenizer config file saved in ./tokenizer/tokenizer_config.json
955
+ Special tokens file saved in ./tokenizer/special_tokens_map.json
checkpoints/wandb/run-20240830_195924-mao0tqjy/files/wandb-summary.json CHANGED
@@ -1 +1 @@
1
- {"train/loss": 1.777546563744545, "train/grad_l2": 0.1973351389169693, "train/weights_l2": 11272.363778775605, "train/lr": 0.0020558542377918645, "train/seconds_per_step": 4.877207107543946, "_timestamp": 1725124200.3571296, "_runtime": 76236.1699206829, "_step": 15525}
 
1
+ {"train/loss": 1.7155881041288377, "train/grad_l2": 0.19877906143665314, "train/weights_l2": 11270.853009077697, "train/lr": 2e-05, "train/seconds_per_step": 4.926107158660889, "_timestamp": 1725147102.2811465, "_runtime": 99138.09393763542, "_step": 20001, "_wandb": {"runtime": 99156}, "eval/loss": 2.0733377319718933, "eval/accuracy": 0.6044892090926607, "eval/time": 964.2750298976898}
checkpoints/wandb/run-20240830_195924-mao0tqjy/logs/debug-internal.log CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoints/wandb/run-20240830_195924-mao0tqjy/logs/debug.log CHANGED
@@ -25,3 +25,12 @@ config: {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False,
25
  2024-08-30 19:59:24,815 INFO MainThread:29052 [wandb_run.py:_redirect():2399] Redirects installed.
26
  2024-08-30 19:59:24,818 INFO MainThread:29052 [wandb_init.py:init():894] run started, returning control to user process
27
  2024-08-30 19:59:44,796 INFO MainThread:29052 [wandb_run.py:_config_callback():1392] config_cb None None {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 34534, 'model': {'klass': 'hf_t5', 'name': 'pszemraj/tFINE-900m-e16-d32', 'overwrite': {'dropout_rate': 0.0}, 'checkpoint_path': '', 'random_init': False, 'compile': True}, 'tokenizer': {'name': 'BEE-spoke-data/slimpajama_tok-48128-BPE-forT5'}, 'data': {'input_length': 1024, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 16, 'before_mask_input_length': 1137, 'target_length': 229}, 'optim': {'name': 'adamwscale', 'base_lr': 0.01, 'batch_size': 128, 'total_steps': 20000, 'epochs': -1, 'warmup_steps': 5000, 'lr_scheduler': 'cosine', 'weight_decay': 0.0001, 'grad_clip': 1.0, 'grad_acc': 8, 'final_cosine': 2e-05}, 'eval': {'every_steps': 1000000000, 'steps': 500, 'corrected_steps': 500}, 'checkpoint': {'every_steps': 2500}, 'logging': {'use_wandb': True, 'wandb_config': {'project': 'nanoT5', 'entity': 'pszemraj', 'tags': ['900m', '1024'], 'mode': 'online'}, 'every_steps': 25, 'grad_l2': True, 'weights_l2': True}, 'slurm_id': 'none', 'working_dir': '/workspace/nanoT5/outputs/2024-08-30/19-59-22', 'n_all_param': 887492096}
 
 
 
 
 
 
 
 
 
 
25
  2024-08-30 19:59:24,815 INFO MainThread:29052 [wandb_run.py:_redirect():2399] Redirects installed.
26
  2024-08-30 19:59:24,818 INFO MainThread:29052 [wandb_init.py:init():894] run started, returning control to user process
27
  2024-08-30 19:59:44,796 INFO MainThread:29052 [wandb_run.py:_config_callback():1392] config_cb None None {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 34534, 'model': {'klass': 'hf_t5', 'name': 'pszemraj/tFINE-900m-e16-d32', 'overwrite': {'dropout_rate': 0.0}, 'checkpoint_path': '', 'random_init': False, 'compile': True}, 'tokenizer': {'name': 'BEE-spoke-data/slimpajama_tok-48128-BPE-forT5'}, 'data': {'input_length': 1024, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 16, 'before_mask_input_length': 1137, 'target_length': 229}, 'optim': {'name': 'adamwscale', 'base_lr': 0.01, 'batch_size': 128, 'total_steps': 20000, 'epochs': -1, 'warmup_steps': 5000, 'lr_scheduler': 'cosine', 'weight_decay': 0.0001, 'grad_clip': 1.0, 'grad_acc': 8, 'final_cosine': 2e-05}, 'eval': {'every_steps': 1000000000, 'steps': 500, 'corrected_steps': 500}, 'checkpoint': {'every_steps': 2500}, 'logging': {'use_wandb': True, 'wandb_config': {'project': 'nanoT5', 'entity': 'pszemraj', 'tags': ['900m', '1024'], 'mode': 'online'}, 'every_steps': 25, 'grad_l2': True, 'weights_l2': True}, 'slurm_id': 'none', 'working_dir': '/workspace/nanoT5/outputs/2024-08-30/19-59-22', 'n_all_param': 887492096}
28
+ 2024-08-31 23:32:00,793 INFO MainThread:29052 [wandb_run.py:_finish():2160] finishing run pszemraj/nanoT5/mao0tqjy
29
+ 2024-08-31 23:32:00,796 INFO MainThread:29052 [wandb_run.py:_atexit_cleanup():2424] got exitcode: 0
30
+ 2024-08-31 23:32:00,797 INFO MainThread:29052 [wandb_run.py:_restore():2406] restore
31
+ 2024-08-31 23:32:00,798 INFO MainThread:29052 [wandb_run.py:_restore():2412] restore done
32
+ 2024-08-31 23:32:00,799 INFO MainThread:29052 [wandb_run.py:_on_finish():2677] communicating current version
33
+ 2024-08-31 23:32:00,827 INFO MainThread:29052 [wandb_run.py:_on_finish():2686] got version response
34
+ 2024-08-31 23:32:06,426 INFO MainThread:29052 [wandb_run.py:_footer_history_summary_info():4078] rendering history
35
+ 2024-08-31 23:32:06,427 INFO MainThread:29052 [wandb_run.py:_footer_history_summary_info():4110] rendering summary
36
+ 2024-08-31 23:32:06,433 INFO MainThread:29052 [wandb_run.py:_footer_sync_info():4037] logging synced files
checkpoints/wandb/run-20240830_195924-mao0tqjy/run-mao0tqjy.wandb CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:34c4d82ee5fa3daa21587d65efb7972e3a2447cce764ad1cd0eaec8aa61ffb19
3
- size 9030581
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:409f487e6f685193cdefb06459ee5126b8198454666c19a7278051ce8b773999
3
+ size 11743810