Upload folder using huggingface_hub

Files changed (7) hide show

README.md CHANGED Viewed

@@ -64,8 +64,8 @@ wandb_watch:
 wandb_name:
 wandb_log_model:
-gradient_accumulation_steps: 1
-micro_batch_size: 32
 num_epochs: 1
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
@@ -74,8 +74,8 @@ learning_rate: 0.00003
 train_on_inputs: false
 group_by_length: false
 bf16: auto
-fp16:
-tf32: false
 gradient_checkpointing: true
 early_stopping_patience:
@@ -107,7 +107,7 @@ fsdp_config:
 This model is a fine-tuned version of [croissantllm/CroissantLLMBase](https://huggingface.co/croissantllm/CroissantLLMBase) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0087
 ## Model description
@@ -127,9 +127,11 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 3e-05
-- train_batch_size: 32
-- eval_batch_size: 32
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
@@ -139,10 +141,10 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 3.6419        | 0.0   | 1    | 3.7140          |
-| 0.0111        | 0.25  | 783  | 0.0143          |
-| 0.0121        | 0.5   | 1566 | 0.0102          |
-| 0.0047        | 0.75  | 2349 | 0.0087          |
 ### Framework versions

 wandb_name:
 wandb_log_model:
+gradient_accumulation_steps: 2
+micro_batch_size: 16
 num_epochs: 1
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 train_on_inputs: false
 group_by_length: false
 bf16: auto
+fp16: false
+tf32: true
 gradient_checkpointing: true
 early_stopping_patience:
 This model is a fine-tuned version of [croissantllm/CroissantLLMBase](https://huggingface.co/croissantllm/CroissantLLMBase) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0082
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 3e-05
+- train_batch_size: 16
+- eval_batch_size: 16
 - seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 6.0535        | 0.0   | 1    | 6.2368          |
+| 0.0107        | 0.25  | 783  | 0.0137          |
+| 0.013         | 0.5   | 1566 | 0.0098          |
+| 0.0077        | 0.75  | 2349 | 0.0082          |
 ### Framework versions

checkpoint-3131/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e4b7406dfe4d7e547b7dea5fa40c09ae187a98fdf7f86614a8c2f9c01a79b005
 size 2690885720

 version https://git-lfs.github.com/spec/v1
+oid sha256:0693212e3dafd097328a6e55b6f6ae142f23fddb0062dd467d19a5646a3822d4
 size 2690885720

checkpoint-3131/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:12715410828501a13b8ac8f8d36f47edbff177d2bfe5a17bda91b1d4f252baef
 size 2696922554

 version https://git-lfs.github.com/spec/v1
+oid sha256:71df049bf0ce6ae468d0e75a42d8fde14c939f880b61a1128f9cb67029c5890d
 size 2696922554

checkpoint-3131/trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-3131/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b1cadfe466ba22ebcf316a7c8e72f82ce88889f38e7fb5732fba8a30f61f984c
 size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:cae5e6d12bd5b08803fd30a727fb96fdea6e2ab517e60befce0b8fd59e44851f
 size 5304

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5852bf136312e24e3b0cbe10f82efcac413c4f811507010b7f9a02b231020801
 size 2690932310

 version https://git-lfs.github.com/spec/v1
+oid sha256:2efefa57b3062c6addb36779977040b48097945b364df2ac35417dc46493aa6a
 size 2690932310

runs/Feb20_18-46-40_ruche-gpu18.cluster/events.out.tfevents.1708451203.ruche-gpu18.cluster.17564.0 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:567a9d9cd896cf23230d2821ffdbeadf143c2f6220c23f76e2e3f48bacaa669b
+size 497645