manu
/

dataset_1_model

@@ -4,53 +4,10 @@ base_model: croissantllm/CroissantLLMBase
 tags:
 - generated_from_trainer
 model-index:
-- name: out_translation
   results: []
 ---
-### Usage
-```python
->>> chat_input = "<|im_start|> system\nYou are a helpful assistant.<|im_end|> \n<|im_start|> user\nTraduit ce texte en anglais : \nEn 1975, la localité comptait 90 habitants, des Guiziga et lors du recensement de 2005, on y a dénombré x habitants.<|im_end|> \n<|im_start|> assistant\n"
->>> inputs = tokenizer(chat_input, return_tensors="pt").to(model.device)
->>> tokens = model.generate(**inputs, **generation_args)
-Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
->>> print(tokenizer.decode(tokens[0]))
-<s><|im_start|> system
-You are a helpful assistant.<|im_end|>
-<|im_start|> user
-Traduit ce texte en anglais :
-En 1975, la localité comptait 90 habitants, des Guiziga et lors du recensement de 2005, on y a dénombré x habitants.<|im_end|>
-<|im_start|> assistant
-When the town had 90 inhabitants in 1975, it was called Guizaga and during the census of 2005, there were x inhabitants.<|im_end|>
-</s>
->>> chat_input = "<|im_start|> system\nYou are a helpful assistant.<|im_end|> \n<|im_start|> user\nCorrige les fautes dans ce texte : \nEn 1975, la localité comptait 90 habitant, des Guiziga et lors du recensement de 2005, on y a dénombrer 56 habitants.<|im_end|> \n<|im_start|> assistant\n"
->>> inputs = tokenizer(chat_input, return_tensors="pt").to(model.device)
->>> tokens = model.generate(**inputs, **generation_args)
-Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
->>> print(tokenizer.decode(tokens[0]))
-<s><|im_start|> system
-You are a helpful assistant.<|im_end|>
-<|im_start|> user
-Corrige les fautes dans ce texte :
-En 1975, la localité comptait 90 habitant, des Guiziga et lors du recensement de 2005, on y a dénombrer 56 habitants.<|im_end|>
-<|im_start|> assistant
- En 1975, la commune comptait 90 habitants dont des Guizigas et au recensement de 2005, elle en compte 56.<|im_end|>
-</s>
->>>
-```
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
@@ -83,11 +40,11 @@ datasets:
     type: sharegpt
 chat_template: "chatml"
-# default_system_message: "Rewrite the sentence to remove the PII."
-dataset_prepared_path: last_pii
 val_set_size: 0.05
-output_dir: ./out_translation
 sequence_len: 2048
 sample_packing: false
@@ -107,9 +64,9 @@ wandb_watch:
 wandb_name:
 wandb_log_model:
-gradient_accumulation_steps: 1
 micro_batch_size: 16
-num_epochs: 1
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 0.00003
@@ -146,11 +103,11 @@ fsdp_config:
 </details><br>
-# out_translation
 This model is a fine-tuned version of [croissantllm/CroissantLLMBase](https://huggingface.co/croissantllm/CroissantLLMBase) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0108
 ## Model description
@@ -173,19 +130,29 @@ The following hyperparameters were used during training:
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
-- num_epochs: 1
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 1.2927        | 0.0   | 1    | 0.3293          |
-| 0.2151        | 0.25  | 145  | 0.0175          |
-| 0.3389        | 0.5   | 290  | 0.0128          |
-| 0.0917        | 0.75  | 435  | 0.0108          |
 ### Framework versions

 tags:
 - generated_from_trainer
 model-index:
+- name: gpfs/workdir/fayssema/models/out_translation
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
     type: sharegpt
 chat_template: "chatml"
+default_system_message: ""
+dataset_prepared_path: new_pii
 val_set_size: 0.05
+output_dir: /gpfs/workdir/fayssema/models/out_translation
 sequence_len: 2048
 sample_packing: false
 wandb_name:
 wandb_log_model:
+gradient_accumulation_steps: 2
 micro_batch_size: 16
+num_epochs: 3
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 0.00003
 </details><br>
+# gpfs/workdir/fayssema/models/out_translation
 This model is a fine-tuned version of [croissantllm/CroissantLLMBase](https://huggingface.co/croissantllm/CroissantLLMBase) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0098
 ## Model description
 - train_batch_size: 16
 - eval_batch_size: 16
 - seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 100
+- num_epochs: 3
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 2.6652        | 0.0   | 1    | 2.0261          |
+| 0.2986        | 0.25  | 73   | 0.0199          |
+| 0.19          | 0.5   | 146  | 0.0136          |
+| 0.3032        | 0.76  | 219  | 0.0158          |
+| 0.1343        | 1.01  | 292  | 0.0125          |
+| 0.12          | 1.26  | 365  | 0.0117          |
+| 0.2266        | 1.51  | 438  | 0.0113          |
+| 0.1924        | 1.77  | 511  | 0.0097          |
+| 0.1448        | 2.02  | 584  | 0.0095          |
+| 0.0718        | 2.27  | 657  | 0.0098          |
+| 0.1184        | 2.52  | 730  | 0.0097          |
+| 0.1124        | 2.77  | 803  | 0.0098          |
 ### Framework versions

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a5c343f5881570187a18199c7bac4fbdb952a13cec855e385618ad21deaeca3e
 size 2690937142

 version https://git-lfs.github.com/spec/v1
+oid sha256:e89e76e2740576ca3dab415a5c722d2fdf0d12f0e2b71f451413f4786723afd7
 size 2690937142