End of training

Browse files

Files changed (6) hide show

README.md +14 -14
config.json +32 -32
model-00001-of-00003.safetensors +1 -1
model-00002-of-00003.safetensors +1 -1
model-00003-of-00003.safetensors +1 -1
sparsification_sftt.py +1 -1

README.md CHANGED Viewed

@@ -4,18 +4,18 @@ base_model: mistralai/Mistral-7B-v0.1
 tags:
 - generated_from_trainer
 model-index:
-- name: Mistral_Sparse_refined_web_graceful_reg_90p_2024-03-13
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Mistral_Sparse_refined_web_graceful_reg_90p_2024-03-13
 This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 9.8327
 ## Model description
@@ -39,10 +39,10 @@ The following hyperparameters were used during training:
 - eval_batch_size: 1
 - seed: 0
 - distributed_type: multi-GPU
-- num_devices: 2
 - gradient_accumulation_steps: 8
-- total_train_batch_size: 16
-- total_eval_batch_size: 2
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - training_steps: 200
@@ -51,14 +51,14 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 3.7951        | 0.0   | 25   | 2.4027          |
-| 3.642         | 0.01  | 50   | 2.3900          |
-| 3.6958        | 0.01  | 75   | 2.3846          |
-| 3.5839        | 0.02  | 100  | 2.3938          |
-| 3.5473        | 0.02  | 125  | 2.4562          |
-| 3.5564        | 0.02  | 150  | 2.5087          |
-| 3.4657        | 0.03  | 175  | 2.5109          |
-| 3.4677        | 0.03  | 200  | 2.5261          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: Mistral_Sparse_refined_web_graceful_reg_90p_2024-03-14
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Mistral_Sparse_refined_web_graceful_reg_90p_2024-03-14
 This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.2117
 ## Model description
 - eval_batch_size: 1
 - seed: 0
 - distributed_type: multi-GPU
+- num_devices: 4
 - gradient_accumulation_steps: 8
+- total_train_batch_size: 32
+- total_eval_batch_size: 4
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - training_steps: 200
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 3.6973        | 0.01  | 25   | 2.3992          |
+| 3.6504        | 0.02  | 50   | 2.3855          |
+| 3.6737        | 0.02  | 75   | 2.3872          |
+| 3.5868        | 0.03  | 100  | 2.4532          |
+| 3.5604        | 0.04  | 125  | 2.4999          |
+| 3.4312        | 0.05  | 150  | 2.5201          |
+| 3.3355        | 0.06  | 175  | 2.5216          |
+| 3.3825        | 0.06  | 200  | 2.5236          |
 ### Framework versions

config.json CHANGED Viewed

@@ -23,38 +23,38 @@
   "rope_theta": 10000.0,
   "sliding_window": 4096,
   "thresholds": [
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0,
-    0.0
   ],
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",

   "rope_theta": 10000.0,
   "sliding_window": 4096,
   "thresholds": [
+    0.0631895586848259,
+    0.07923770695924759,
+    0.089267797768116,
+    0.10732196271419525,
+    0.12738214433193207,
+    0.1414242684841156,
+    0.15346036851406097,
+    0.16349045932292938,
+    0.1675025075674057,
+    0.1675025075674057,
+    0.1675025075674057,
+    0.1735205501317978,
+    0.17552657425403595,
+    0.1775325983762741,
+    0.18756268918514252,
+    0.1935807317495346,
+    0.19759276509284973,
+    0.21364091336727142,
+    0.22367100417613983,
+    0.23169508576393127,
+    0.22367100417613983,
+    0.22968906164169312,
+    0.22367100417613983,
+    0.22367100417613983,
+    0.23169508576393127,
+    0.23971915245056152,
+    0.2457372099161148,
+    0.2577733099460602,
+    0.2678034007549286,
+    0.27382147312164307,
+    0.27582746744155884,
+    0.277833491563797
   ],
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9d02bd152af94684e1f5985811068428480e16588820437e00141f69d460ee7a
 size 4943162336

 version https://git-lfs.github.com/spec/v1
+oid sha256:824e1239d9fae01a0d92076a96352399912e41d2ab3770881a2165c5004dae3e
 size 4943162336

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2b1ee2301c7351b6c17a5ad1bc6e051bee91d07d6c423f11a0b0df809dba126f
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:6a14f497d6e09056c31f8d91a735b9025435eab785f3842a46e2377147881a34
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:15717942cb0df08ca91d09d06dbeb1ad2295bc2bc6f5d9f6da0a6e67714fef5e
 size 4540516344

 version https://git-lfs.github.com/spec/v1
+oid sha256:f7814c4a55aa339eeed03a589db54cc81730b2e3484eebd1155b908483efea73
 size 4540516344

sparsification_sftt.py CHANGED Viewed

@@ -585,7 +585,7 @@ class GracefulRegularizationScheduler(TrainerCallback):
             enable_sparse_silu(base_model)
             self.trainer.evaluate()
             save_act_hist(base_model, self.act_hist_path)
-            set_sparse_threshold(base_model, self.targeted_sparsity, True)
             deactivate_stats(base_model)
             self.trainer.use_sparse_regularization = self.keep_regularization_with_kill
             # set_layer_specific_regularization(model.get_base_model())

             enable_sparse_silu(base_model)
             self.trainer.evaluate()
             save_act_hist(base_model, self.act_hist_path)
+            set_sparse_threshold(base_model, self.targeted_sparsity, False)
             deactivate_stats(base_model)
             self.trainer.use_sparse_regularization = self.keep_regularization_with_kill
             # set_layer_specific_regularization(model.get_base_model())