manu commited on
Commit
7dfb1da
1 Parent(s): 035a796

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +23 -56
  2. pytorch_model.bin +1 -1
README.md CHANGED
@@ -4,53 +4,10 @@ base_model: croissantllm/CroissantLLMBase
4
  tags:
5
  - generated_from_trainer
6
  model-index:
7
- - name: out_translation
8
  results: []
9
  ---
10
 
11
-
12
- ### Usage
13
-
14
- ```python
15
- >>> chat_input = "<|im_start|> system\nYou are a helpful assistant.<|im_end|> \n<|im_start|> user\nTraduit ce texte en anglais : \nEn 1975, la localité comptait 90 habitants, des Guiziga et lors du recensement de 2005, on y a dénombré x habitants.<|im_end|> \n<|im_start|> assistant\n"
16
-
17
- >>> inputs = tokenizer(chat_input, return_tensors="pt").to(model.device)
18
-
19
- >>> tokens = model.generate(**inputs, **generation_args)
20
- Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
21
-
22
- >>> print(tokenizer.decode(tokens[0]))
23
-
24
- <s><|im_start|> system
25
- You are a helpful assistant.<|im_end|>
26
- <|im_start|> user
27
- Traduit ce texte en anglais :
28
- En 1975, la localité comptait 90 habitants, des Guiziga et lors du recensement de 2005, on y a dénombré x habitants.<|im_end|>
29
- <|im_start|> assistant
30
- When the town had 90 inhabitants in 1975, it was called Guizaga and during the census of 2005, there were x inhabitants.<|im_end|>
31
- </s>
32
-
33
-
34
- >>> chat_input = "<|im_start|> system\nYou are a helpful assistant.<|im_end|> \n<|im_start|> user\nCorrige les fautes dans ce texte : \nEn 1975, la localité comptait 90 habitant, des Guiziga et lors du recensement de 2005, on y a dénombrer 56 habitants.<|im_end|> \n<|im_start|> assistant\n"
35
-
36
- >>> inputs = tokenizer(chat_input, return_tensors="pt").to(model.device)
37
-
38
- >>> tokens = model.generate(**inputs, **generation_args)
39
- Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
40
-
41
- >>> print(tokenizer.decode(tokens[0]))
42
- <s><|im_start|> system
43
- You are a helpful assistant.<|im_end|>
44
- <|im_start|> user
45
- Corrige les fautes dans ce texte :
46
- En 1975, la localité comptait 90 habitant, des Guiziga et lors du recensement de 2005, on y a dénombrer 56 habitants.<|im_end|>
47
-
48
- <|im_start|> assistant
49
- En 1975, la commune comptait 90 habitants dont des Guizigas et au recensement de 2005, elle en compte 56.<|im_end|>
50
- </s>
51
- >>>
52
- ```
53
-
54
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
55
  should probably proofread and complete it, then remove this comment. -->
56
 
@@ -83,11 +40,11 @@ datasets:
83
  type: sharegpt
84
 
85
  chat_template: "chatml"
86
- # default_system_message: "Rewrite the sentence to remove the PII."
87
 
88
- dataset_prepared_path: last_pii
89
  val_set_size: 0.05
90
- output_dir: ./out_translation
91
 
92
  sequence_len: 2048
93
  sample_packing: false
@@ -107,9 +64,9 @@ wandb_watch:
107
  wandb_name:
108
  wandb_log_model:
109
 
110
- gradient_accumulation_steps: 1
111
  micro_batch_size: 16
112
- num_epochs: 1
113
  optimizer: adamw_bnb_8bit
114
  lr_scheduler: cosine
115
  learning_rate: 0.00003
@@ -146,11 +103,11 @@ fsdp_config:
146
 
147
  </details><br>
148
 
149
- # out_translation
150
 
151
  This model is a fine-tuned version of [croissantllm/CroissantLLMBase](https://huggingface.co/croissantllm/CroissantLLMBase) on the None dataset.
152
  It achieves the following results on the evaluation set:
153
- - Loss: 0.0108
154
 
155
  ## Model description
156
 
@@ -173,19 +130,29 @@ The following hyperparameters were used during training:
173
  - train_batch_size: 16
174
  - eval_batch_size: 16
175
  - seed: 42
 
 
176
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
177
  - lr_scheduler_type: cosine
178
  - lr_scheduler_warmup_steps: 100
179
- - num_epochs: 1
180
 
181
  ### Training results
182
 
183
  | Training Loss | Epoch | Step | Validation Loss |
184
  |:-------------:|:-----:|:----:|:---------------:|
185
- | 1.2927 | 0.0 | 1 | 0.3293 |
186
- | 0.2151 | 0.25 | 145 | 0.0175 |
187
- | 0.3389 | 0.5 | 290 | 0.0128 |
188
- | 0.0917 | 0.75 | 435 | 0.0108 |
 
 
 
 
 
 
 
 
189
 
190
 
191
  ### Framework versions
 
4
  tags:
5
  - generated_from_trainer
6
  model-index:
7
+ - name: gpfs/workdir/fayssema/models/out_translation
8
  results: []
9
  ---
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
 
40
  type: sharegpt
41
 
42
  chat_template: "chatml"
43
+ default_system_message: ""
44
 
45
+ dataset_prepared_path: new_pii
46
  val_set_size: 0.05
47
+ output_dir: /gpfs/workdir/fayssema/models/out_translation
48
 
49
  sequence_len: 2048
50
  sample_packing: false
 
64
  wandb_name:
65
  wandb_log_model:
66
 
67
+ gradient_accumulation_steps: 2
68
  micro_batch_size: 16
69
+ num_epochs: 3
70
  optimizer: adamw_bnb_8bit
71
  lr_scheduler: cosine
72
  learning_rate: 0.00003
 
103
 
104
  </details><br>
105
 
106
+ # gpfs/workdir/fayssema/models/out_translation
107
 
108
  This model is a fine-tuned version of [croissantllm/CroissantLLMBase](https://huggingface.co/croissantllm/CroissantLLMBase) on the None dataset.
109
  It achieves the following results on the evaluation set:
110
+ - Loss: 0.0098
111
 
112
  ## Model description
113
 
 
130
  - train_batch_size: 16
131
  - eval_batch_size: 16
132
  - seed: 42
133
+ - gradient_accumulation_steps: 2
134
+ - total_train_batch_size: 32
135
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
136
  - lr_scheduler_type: cosine
137
  - lr_scheduler_warmup_steps: 100
138
+ - num_epochs: 3
139
 
140
  ### Training results
141
 
142
  | Training Loss | Epoch | Step | Validation Loss |
143
  |:-------------:|:-----:|:----:|:---------------:|
144
+ | 2.6652 | 0.0 | 1 | 2.0261 |
145
+ | 0.2986 | 0.25 | 73 | 0.0199 |
146
+ | 0.19 | 0.5 | 146 | 0.0136 |
147
+ | 0.3032 | 0.76 | 219 | 0.0158 |
148
+ | 0.1343 | 1.01 | 292 | 0.0125 |
149
+ | 0.12 | 1.26 | 365 | 0.0117 |
150
+ | 0.2266 | 1.51 | 438 | 0.0113 |
151
+ | 0.1924 | 1.77 | 511 | 0.0097 |
152
+ | 0.1448 | 2.02 | 584 | 0.0095 |
153
+ | 0.0718 | 2.27 | 657 | 0.0098 |
154
+ | 0.1184 | 2.52 | 730 | 0.0097 |
155
+ | 0.1124 | 2.77 | 803 | 0.0098 |
156
 
157
 
158
  ### Framework versions
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a5c343f5881570187a18199c7bac4fbdb952a13cec855e385618ad21deaeca3e
3
  size 2690937142
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e89e76e2740576ca3dab415a5c722d2fdf0d12f0e2b71f451413f4786723afd7
3
  size 2690937142