manu commited on
Commit
c905d64
·
verified ·
1 Parent(s): 0a220d5

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -64,8 +64,8 @@ wandb_watch:
64
  wandb_name:
65
  wandb_log_model:
66
 
67
- gradient_accumulation_steps: 1
68
- micro_batch_size: 32
69
  num_epochs: 1
70
  optimizer: adamw_bnb_8bit
71
  lr_scheduler: cosine
@@ -74,8 +74,8 @@ learning_rate: 0.00003
74
  train_on_inputs: false
75
  group_by_length: false
76
  bf16: auto
77
- fp16:
78
- tf32: false
79
 
80
  gradient_checkpointing: true
81
  early_stopping_patience:
@@ -107,7 +107,7 @@ fsdp_config:
107
 
108
  This model is a fine-tuned version of [croissantllm/CroissantLLMBase](https://huggingface.co/croissantllm/CroissantLLMBase) on the None dataset.
109
  It achieves the following results on the evaluation set:
110
- - Loss: 0.0087
111
 
112
  ## Model description
113
 
@@ -127,9 +127,11 @@ More information needed
127
 
128
  The following hyperparameters were used during training:
129
  - learning_rate: 3e-05
130
- - train_batch_size: 32
131
- - eval_batch_size: 32
132
  - seed: 42
 
 
133
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
134
  - lr_scheduler_type: cosine
135
  - lr_scheduler_warmup_steps: 100
@@ -139,10 +141,10 @@ The following hyperparameters were used during training:
139
 
140
  | Training Loss | Epoch | Step | Validation Loss |
141
  |:-------------:|:-----:|:----:|:---------------:|
142
- | 3.6419 | 0.0 | 1 | 3.7140 |
143
- | 0.0111 | 0.25 | 783 | 0.0143 |
144
- | 0.0121 | 0.5 | 1566 | 0.0102 |
145
- | 0.0047 | 0.75 | 2349 | 0.0087 |
146
 
147
 
148
  ### Framework versions
 
64
  wandb_name:
65
  wandb_log_model:
66
 
67
+ gradient_accumulation_steps: 2
68
+ micro_batch_size: 16
69
  num_epochs: 1
70
  optimizer: adamw_bnb_8bit
71
  lr_scheduler: cosine
 
74
  train_on_inputs: false
75
  group_by_length: false
76
  bf16: auto
77
+ fp16: false
78
+ tf32: true
79
 
80
  gradient_checkpointing: true
81
  early_stopping_patience:
 
107
 
108
  This model is a fine-tuned version of [croissantllm/CroissantLLMBase](https://huggingface.co/croissantllm/CroissantLLMBase) on the None dataset.
109
  It achieves the following results on the evaluation set:
110
+ - Loss: 0.0082
111
 
112
  ## Model description
113
 
 
127
 
128
  The following hyperparameters were used during training:
129
  - learning_rate: 3e-05
130
+ - train_batch_size: 16
131
+ - eval_batch_size: 16
132
  - seed: 42
133
+ - gradient_accumulation_steps: 2
134
+ - total_train_batch_size: 32
135
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
136
  - lr_scheduler_type: cosine
137
  - lr_scheduler_warmup_steps: 100
 
141
 
142
  | Training Loss | Epoch | Step | Validation Loss |
143
  |:-------------:|:-----:|:----:|:---------------:|
144
+ | 6.0535 | 0.0 | 1 | 6.2368 |
145
+ | 0.0107 | 0.25 | 783 | 0.0137 |
146
+ | 0.013 | 0.5 | 1566 | 0.0098 |
147
+ | 0.0077 | 0.75 | 2349 | 0.0082 |
148
 
149
 
150
  ### Framework versions
checkpoint-3131/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e4b7406dfe4d7e547b7dea5fa40c09ae187a98fdf7f86614a8c2f9c01a79b005
3
  size 2690885720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0693212e3dafd097328a6e55b6f6ae142f23fddb0062dd467d19a5646a3822d4
3
  size 2690885720
checkpoint-3131/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:12715410828501a13b8ac8f8d36f47edbff177d2bfe5a17bda91b1d4f252baef
3
  size 2696922554
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71df049bf0ce6ae468d0e75a42d8fde14c939f880b61a1128f9cb67029c5890d
3
  size 2696922554
checkpoint-3131/trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoint-3131/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b1cadfe466ba22ebcf316a7c8e72f82ce88889f38e7fb5732fba8a30f61f984c
3
  size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cae5e6d12bd5b08803fd30a727fb96fdea6e2ab517e60befce0b8fd59e44851f
3
  size 5304
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5852bf136312e24e3b0cbe10f82efcac413c4f811507010b7f9a02b231020801
3
  size 2690932310
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2efefa57b3062c6addb36779977040b48097945b364df2ac35417dc46493aa6a
3
  size 2690932310
runs/Feb20_18-46-40_ruche-gpu18.cluster/events.out.tfevents.1708451203.ruche-gpu18.cluster.17564.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:567a9d9cd896cf23230d2821ffdbeadf143c2f6220c23f76e2e3f48bacaa669b
3
+ size 497645