wissamantoun commited on
Commit
e43f405
·
verified ·
1 Parent(s): 3a2c13d

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,277 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ tags:
5
+ - roberta
6
+ - token-classification
7
+ base_model: almanach/camembertv2-base
8
+ datasets:
9
+ - FTB-NER
10
+ metrics:
11
+ - f1
12
+ pipeline_tag: token-classification
13
+ library_name: transformers
14
+ model-index:
15
+ - name: almanach/camembertv2-base-ftb-ner
16
+ results:
17
+ - task:
18
+ type: token-classification
19
+ name: Named Entity Recognition (NER)
20
+ dataset:
21
+ type: ftb-ner
22
+ name: French Treebank Named Entity Recognition
23
+ metrics:
24
+ - name: f1
25
+ type: f1
26
+ value: 0.93548
27
+ verified: false
28
+ ---
29
+
30
+ # Model Card for almanach/camembertv2-base-ftb-ner
31
+
32
+ almanach/camembertv2-base-ftb-ner is a roberta model for token classification. It is trained on the FTB-NER dataset for the task of Named Entity Recognition (NER). The model achieves an f1 score of 0.93548 on the FTB-NER dataset.
33
+
34
+ The model is part of the almanach/camembertv2-base family of model finetunes.
35
+
36
+ ## Model Details
37
+
38
+ ### Model Description
39
+
40
+ - **Developed by:** Wissam Antoun (Phd Student at Almanach, Inria-Paris)
41
+ - **Model type:** roberta
42
+ - **Language(s) (NLP):** French
43
+ - **License:** MIT
44
+ - **Finetuned from model [optional]:** almanach/camembertv2-base
45
+
46
+ ### Model Sources [optional]
47
+
48
+ <!-- Provide the basic links for the model. -->
49
+
50
+ - **Repository:** https://github.com/WissamAntoun/camemberta
51
+ - **Paper:** https://arxiv.org/abs/2411.08868
52
+
53
+ ## Uses
54
+
55
+ The model can be used for token classification tasks in French for Named Entity Recognition (NER).
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ The model may exhibit biases based on the training data. The model may not generalize well to other datasets or tasks. The model may also have limitations in terms of the data it was trained on.
60
+
61
+
62
+ ## How to Get Started with the Model
63
+
64
+ Use the code below to get started with the model.
65
+
66
+ ```python
67
+ from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
68
+
69
+ model = AutoModelForTokenClassification.from_pretrained("almanach/camembertv2-base-ftb-ner")
70
+ tokenizer = AutoTokenizer.from_pretrained("almanach/camembertv2-base-ftb-ner")
71
+
72
+ classifier = pipeline("token-classification", model=model, tokenizer=tokenizer)
73
+
74
+ classifier("Votre texte ici")
75
+ ```
76
+
77
+
78
+ ## Training Details
79
+
80
+ ### Training Data
81
+
82
+ The model is trained on the FTB-NER dataset.
83
+
84
+ - Dataset Name: FTB-NER
85
+ - Dataset Size:
86
+ - Train: 9881
87
+ - Dev: 1235
88
+ - Test: 1235
89
+
90
+
91
+ ### Training Procedure
92
+
93
+ Model trained with the run_ner.py script from the huggingface repository.
94
+
95
+
96
+
97
+ #### Training Hyperparameters
98
+
99
+ ```yml
100
+ accelerator_config: '{''split_batches'': False, ''dispatch_batches'': None, ''even_batches'':
101
+ True, ''use_seedable_sampler'': True, ''non_blocking'': False, ''gradient_accumulation_kwargs'':
102
+ None}'
103
+ adafactor: false
104
+ adam_beta1: 0.9
105
+ adam_beta2: 0.999
106
+ adam_epsilon: 1.0e-08
107
+ auto_find_batch_size: false
108
+ base_model: camembertv2
109
+ base_model_name: camembertv2-base-bf16-p2-17000
110
+ batch_eval_metrics: false
111
+ bf16: false
112
+ bf16_full_eval: false
113
+ data_seed: 1337.0
114
+ dataloader_drop_last: false
115
+ dataloader_num_workers: 0
116
+ dataloader_persistent_workers: false
117
+ dataloader_pin_memory: true
118
+ dataloader_prefetch_factor: .nan
119
+ ddp_backend: .nan
120
+ ddp_broadcast_buffers: .nan
121
+ ddp_bucket_cap_mb: .nan
122
+ ddp_find_unused_parameters: .nan
123
+ ddp_timeout: 1800
124
+ debug: '[]'
125
+ deepspeed: .nan
126
+ disable_tqdm: false
127
+ dispatch_batches: .nan
128
+ do_eval: true
129
+ do_predict: false
130
+ do_train: true
131
+ epoch: 8.0
132
+ eval_accumulation_steps: 4
133
+ eval_accuracy: 0.9937000109565028
134
+ eval_delay: 0
135
+ eval_do_concat_batches: true
136
+ eval_f1: 0.935483870967742
137
+ eval_loss: 0.0347304567694664
138
+ eval_on_start: false
139
+ eval_precision: 0.9362204724409448
140
+ eval_recall: 0.934748427672956
141
+ eval_runtime: 2.7702
142
+ eval_samples: 1235.0
143
+ eval_samples_per_second: 445.821
144
+ eval_steps: .nan
145
+ eval_steps_per_second: 55.953
146
+ eval_strategy: epoch
147
+ eval_use_gather_object: false
148
+ evaluation_strategy: epoch
149
+ fp16: false
150
+ fp16_backend: auto
151
+ fp16_full_eval: false
152
+ fp16_opt_level: O1
153
+ fsdp: '[]'
154
+ fsdp_config: '{''min_num_params'': 0, ''xla'': False, ''xla_fsdp_v2'': False, ''xla_fsdp_grad_ckpt'':
155
+ False}'
156
+ fsdp_min_num_params: 0
157
+ fsdp_transformer_layer_cls_to_wrap: .nan
158
+ full_determinism: false
159
+ gradient_accumulation_steps: 2
160
+ gradient_checkpointing: false
161
+ gradient_checkpointing_kwargs: .nan
162
+ greater_is_better: true
163
+ group_by_length: false
164
+ half_precision_backend: auto
165
+ hub_always_push: false
166
+ hub_model_id: .nan
167
+ hub_private_repo: false
168
+ hub_strategy: every_save
169
+ hub_token: <HUB_TOKEN>
170
+ ignore_data_skip: false
171
+ include_inputs_for_metrics: false
172
+ include_num_input_tokens_seen: false
173
+ include_tokens_per_second: false
174
+ jit_mode_eval: false
175
+ label_names: .nan
176
+ label_smoothing_factor: 0.0
177
+ learning_rate: 5.000000000000001e-05
178
+ length_column_name: length
179
+ load_best_model_at_end: true
180
+ local_rank: 0
181
+ log_level: debug
182
+ log_level_replica: warning
183
+ log_on_each_node: true
184
+ logging_dir: /scratch/camembertv2/runs/results/ftb_ner/camembertv2-base-bf16-p2-17000/max_seq_length-192-gradient_accumulation_steps-2-precision-fp32-learning_rate-5.000000000000001e-05-epochs-8-lr_scheduler-linear-warmup_steps-0.1/SEED-1337/logs
185
+ logging_first_step: false
186
+ logging_nan_inf_filter: true
187
+ logging_steps: 100
188
+ logging_strategy: steps
189
+ lr_scheduler_kwargs: '{}'
190
+ lr_scheduler_type: linear
191
+ max_grad_norm: 1.0
192
+ max_steps: -1
193
+ metric_for_best_model: f1
194
+ mp_parameters: .nan
195
+ name: camembertv2/runs/results/ftb_ner/camembertv2-base-bf16-p2-17000/max_seq_length-192-gradient_accumulation_steps-2-precision-fp32-learning_rate-5.000000000000001e-05-epochs-8-lr_scheduler-linear-warmup_steps-0.1
196
+ neftune_noise_alpha: .nan
197
+ no_cuda: false
198
+ num_train_epochs: 8.0
199
+ optim: adamw_torch
200
+ optim_args: .nan
201
+ optim_target_modules: .nan
202
+ output_dir: /scratch/camembertv2/runs/results/ftb_ner/camembertv2-base-bf16-p2-17000/max_seq_length-192-gradient_accumulation_steps-2-precision-fp32-learning_rate-5.000000000000001e-05-epochs-8-lr_scheduler-linear-warmup_steps-0.1/SEED-1337
203
+ overwrite_output_dir: false
204
+ past_index: -1
205
+ per_device_eval_batch_size: 8
206
+ per_device_train_batch_size: 8
207
+ per_gpu_eval_batch_size: .nan
208
+ per_gpu_train_batch_size: .nan
209
+ prediction_loss_only: false
210
+ push_to_hub: false
211
+ push_to_hub_model_id: .nan
212
+ push_to_hub_organization: .nan
213
+ push_to_hub_token: <PUSH_TO_HUB_TOKEN>
214
+ ray_scope: last
215
+ remove_unused_columns: true
216
+ report_to: '[''tensorboard'']'
217
+ restore_callback_states_from_checkpoint: false
218
+ resume_from_checkpoint: .nan
219
+ run_name: /scratch/camembertv2/runs/results/ftb_ner/camembertv2-base-bf16-p2-17000/max_seq_length-192-gradient_accumulation_steps-2-precision-fp32-learning_rate-5.000000000000001e-05-epochs-8-lr_scheduler-linear-warmup_steps-0.1/SEED-1337
220
+ save_on_each_node: false
221
+ save_only_model: false
222
+ save_safetensors: true
223
+ save_steps: 500
224
+ save_strategy: epoch
225
+ save_total_limit: .nan
226
+ seed: 1337
227
+ skip_memory_metrics: true
228
+ split_batches: .nan
229
+ tf32: .nan
230
+ torch_compile: true
231
+ torch_compile_backend: inductor
232
+ torch_compile_mode: .nan
233
+ torch_empty_cache_steps: .nan
234
+ torchdynamo: .nan
235
+ total_flos: 2833132740217920.0
236
+ tpu_metrics_debug: false
237
+ tpu_num_cores: .nan
238
+ train_loss: 0.0880794880495777
239
+ train_runtime: 679.3683
240
+ train_samples: 9881
241
+ train_samples_per_second: 116.355
242
+ train_steps_per_second: 7.277
243
+ use_cpu: false
244
+ use_ipex: false
245
+ use_legacy_prediction_loop: false
246
+ use_mps_device: false
247
+ warmup_ratio: 0.1
248
+ warmup_steps: 0
249
+ weight_decay: 0.0
250
+
251
+ ```
252
+
253
+ #### Results
254
+
255
+ **F1-Score:** 0.93548
256
+
257
+ ## Technical Specifications
258
+
259
+ ### Model Architecture and Objective
260
+
261
+ roberta for token classification.
262
+
263
+ ## Citation
264
+
265
+ **BibTeX:**
266
+
267
+ ```bibtex
268
+ @misc{antoun2024camembert20smarterfrench,
269
+ title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
270
+ author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
271
+ year={2024},
272
+ eprint={2411.08868},
273
+ archivePrefix={arXiv},
274
+ primaryClass={cs.CL},
275
+ url={https://arxiv.org/abs/2411.08868},
276
+ }
277
+ ```
all_results.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 8.0,
3
+ "eval_accuracy": 0.9937000109565027,
4
+ "eval_f1": 0.935483870967742,
5
+ "eval_loss": 0.0347304567694664,
6
+ "eval_precision": 0.9362204724409449,
7
+ "eval_recall": 0.934748427672956,
8
+ "eval_runtime": 2.7702,
9
+ "eval_samples": 1235,
10
+ "eval_samples_per_second": 445.821,
11
+ "eval_steps_per_second": 55.953,
12
+ "total_flos": 2833132740217920.0,
13
+ "train_loss": 0.08807948804957774,
14
+ "train_runtime": 679.3683,
15
+ "train_samples": 9881,
16
+ "train_samples_per_second": 116.355,
17
+ "train_steps_per_second": 7.277
18
+ }
config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/",
3
+ "architectures": [
4
+ "RobertaForTokenClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "classifier_dropout": null,
9
+ "embedding_size": 768,
10
+ "eos_token_id": 2,
11
+ "finetuning_task": "ner",
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.1,
14
+ "hidden_size": 768,
15
+ "id2label": {
16
+ "0": "B-Company",
17
+ "1": "B-FictionCharacter",
18
+ "2": "B-Location",
19
+ "3": "B-Organization",
20
+ "4": "B-Person",
21
+ "5": "B-POI",
22
+ "6": "B-Product",
23
+ "7": "I-Company",
24
+ "8": "I-FictionCharacter",
25
+ "9": "I-Location",
26
+ "10": "I-Organization",
27
+ "11": "I-Person",
28
+ "12": "I-POI",
29
+ "13": "I-Product",
30
+ "14": "O"
31
+ },
32
+ "initializer_range": 0.02,
33
+ "intermediate_size": 3072,
34
+ "label2id": {
35
+ "B-Company": 0,
36
+ "B-FictionCharacter": 1,
37
+ "B-Location": 2,
38
+ "B-Organization": 3,
39
+ "B-POI": 5,
40
+ "B-Person": 4,
41
+ "B-Product": 6,
42
+ "I-Company": 7,
43
+ "I-FictionCharacter": 8,
44
+ "I-Location": 9,
45
+ "I-Organization": 10,
46
+ "I-POI": 12,
47
+ "I-Person": 11,
48
+ "I-Product": 13,
49
+ "O": 14
50
+ },
51
+ "layer_norm_eps": 1e-07,
52
+ "max_position_embeddings": 1025,
53
+ "model_name": "camembertv2-base-bf16",
54
+ "model_type": "roberta",
55
+ "num_attention_heads": 12,
56
+ "num_hidden_layers": 12,
57
+ "pad_token_id": 0,
58
+ "position_biased_input": true,
59
+ "position_embedding_type": "absolute",
60
+ "torch_dtype": "float32",
61
+ "transformers_version": "4.44.2",
62
+ "type_vocab_size": 1,
63
+ "use_cache": true,
64
+ "vocab_size": 32768
65
+ }
eval_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 8.0,
3
+ "eval_accuracy": 0.9937000109565027,
4
+ "eval_f1": 0.935483870967742,
5
+ "eval_loss": 0.0347304567694664,
6
+ "eval_precision": 0.9362204724409449,
7
+ "eval_recall": 0.934748427672956,
8
+ "eval_runtime": 2.7702,
9
+ "eval_samples": 1235,
10
+ "eval_samples_per_second": 445.821,
11
+ "eval_steps_per_second": 55.953
12
+ }
logs/events.out.tfevents.1724620471.nefgpu51.144270.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05e0dfd1d280d26e46351f866ba2d2f08b3454e69b9cb8753372402a245240e7
3
+ size 20871
logs/events.out.tfevents.1724621154.nefgpu51.144270.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13c7b317b552a6e5428a9115dc9a4dcacaee232a5f8b1063e30f1f833ecc63c9
3
+ size 512
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed6da0d29a943fc0383213e4349a636a6241359582e6f0d94e9a3f500ab615b8
3
+ size 444109236
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[CLS]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[SEP]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[UNK]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "[MASK]",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "[CLS]",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "[CLS]",
48
+ "eos_token": "[SEP]",
49
+ "errors": "replace",
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 1000000000000000019884624838656,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "tokenizer_class": "RobertaTokenizer",
55
+ "trim_offsets": true,
56
+ "unk_token": "[UNK]"
57
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 8.0,
3
+ "total_flos": 2833132740217920.0,
4
+ "train_loss": 0.08807948804957774,
5
+ "train_runtime": 679.3683,
6
+ "train_samples": 9881,
7
+ "train_samples_per_second": 116.355,
8
+ "train_steps_per_second": 7.277
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,481 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.935483870967742,
3
+ "best_model_checkpoint": "/scratch/camembertv2/runs/results/ftb_ner/camembertv2-base-bf16-p2-17000/max_seq_length-192-gradient_accumulation_steps-2-precision-fp32-learning_rate-5.000000000000001e-05-epochs-8-lr_scheduler-linear-warmup_steps-0.1/SEED-1337/checkpoint-4326",
4
+ "epoch": 8.0,
5
+ "eval_steps": 500,
6
+ "global_step": 4944,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.16181229773462782,
13
+ "grad_norm": 9.89955997467041,
14
+ "learning_rate": 1.0101010101010103e-05,
15
+ "loss": 1.8738,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.32362459546925565,
20
+ "grad_norm": 2.3764805793762207,
21
+ "learning_rate": 2.0202020202020206e-05,
22
+ "loss": 0.6979,
23
+ "step": 200
24
+ },
25
+ {
26
+ "epoch": 0.4854368932038835,
27
+ "grad_norm": 1.3664543628692627,
28
+ "learning_rate": 3.030303030303031e-05,
29
+ "loss": 0.5111,
30
+ "step": 300
31
+ },
32
+ {
33
+ "epoch": 0.6472491909385113,
34
+ "grad_norm": 0.6372264623641968,
35
+ "learning_rate": 4.040404040404041e-05,
36
+ "loss": 0.2666,
37
+ "step": 400
38
+ },
39
+ {
40
+ "epoch": 0.8090614886731392,
41
+ "grad_norm": 0.5098221302032471,
42
+ "learning_rate": 4.9943807597212865e-05,
43
+ "loss": 0.1199,
44
+ "step": 500
45
+ },
46
+ {
47
+ "epoch": 0.970873786407767,
48
+ "grad_norm": 0.5974541902542114,
49
+ "learning_rate": 4.8819959541470004e-05,
50
+ "loss": 0.0775,
51
+ "step": 600
52
+ },
53
+ {
54
+ "epoch": 1.0,
55
+ "eval_accuracy": 0.9852635038895584,
56
+ "eval_f1": 0.7820512820512822,
57
+ "eval_loss": 0.0750068947672844,
58
+ "eval_precision": 0.7514492753623189,
59
+ "eval_recall": 0.815251572327044,
60
+ "eval_runtime": 3.2799,
61
+ "eval_samples_per_second": 376.537,
62
+ "eval_steps_per_second": 47.258,
63
+ "step": 618
64
+ },
65
+ {
66
+ "epoch": 1.132686084142395,
67
+ "grad_norm": 0.15989889204502106,
68
+ "learning_rate": 4.7696111485727136e-05,
69
+ "loss": 0.0648,
70
+ "step": 700
71
+ },
72
+ {
73
+ "epoch": 1.2944983818770226,
74
+ "grad_norm": 0.28292131423950195,
75
+ "learning_rate": 4.6572263429984275e-05,
76
+ "loss": 0.0555,
77
+ "step": 800
78
+ },
79
+ {
80
+ "epoch": 1.4563106796116505,
81
+ "grad_norm": 0.09367953985929489,
82
+ "learning_rate": 4.544841537424141e-05,
83
+ "loss": 0.0485,
84
+ "step": 900
85
+ },
86
+ {
87
+ "epoch": 1.6181229773462782,
88
+ "grad_norm": 0.3826428949832916,
89
+ "learning_rate": 4.4324567318498546e-05,
90
+ "loss": 0.0401,
91
+ "step": 1000
92
+ },
93
+ {
94
+ "epoch": 1.779935275080906,
95
+ "grad_norm": 0.18068315088748932,
96
+ "learning_rate": 4.3200719262755685e-05,
97
+ "loss": 0.0369,
98
+ "step": 1100
99
+ },
100
+ {
101
+ "epoch": 1.941747572815534,
102
+ "grad_norm": 0.23946309089660645,
103
+ "learning_rate": 4.207687120701282e-05,
104
+ "loss": 0.0387,
105
+ "step": 1200
106
+ },
107
+ {
108
+ "epoch": 2.0,
109
+ "eval_accuracy": 0.9903582776377781,
110
+ "eval_f1": 0.8799067236688691,
111
+ "eval_loss": 0.04682581126689911,
112
+ "eval_precision": 0.8700999231360492,
113
+ "eval_recall": 0.889937106918239,
114
+ "eval_runtime": 2.8072,
115
+ "eval_samples_per_second": 439.943,
116
+ "eval_steps_per_second": 55.215,
117
+ "step": 1236
118
+ },
119
+ {
120
+ "epoch": 2.103559870550162,
121
+ "grad_norm": 0.8596442937850952,
122
+ "learning_rate": 4.0953023151269956e-05,
123
+ "loss": 0.0285,
124
+ "step": 1300
125
+ },
126
+ {
127
+ "epoch": 2.26537216828479,
128
+ "grad_norm": 0.03754520043730736,
129
+ "learning_rate": 3.9829175095527095e-05,
130
+ "loss": 0.0322,
131
+ "step": 1400
132
+ },
133
+ {
134
+ "epoch": 2.4271844660194173,
135
+ "grad_norm": 0.6684575080871582,
136
+ "learning_rate": 3.870532703978423e-05,
137
+ "loss": 0.023,
138
+ "step": 1500
139
+ },
140
+ {
141
+ "epoch": 2.588996763754045,
142
+ "grad_norm": 0.03833441436290741,
143
+ "learning_rate": 3.758147898404136e-05,
144
+ "loss": 0.0268,
145
+ "step": 1600
146
+ },
147
+ {
148
+ "epoch": 2.750809061488673,
149
+ "grad_norm": 0.3890291452407837,
150
+ "learning_rate": 3.6457630928298505e-05,
151
+ "loss": 0.0217,
152
+ "step": 1700
153
+ },
154
+ {
155
+ "epoch": 2.912621359223301,
156
+ "grad_norm": 0.4564450681209564,
157
+ "learning_rate": 3.533378287255564e-05,
158
+ "loss": 0.0295,
159
+ "step": 1800
160
+ },
161
+ {
162
+ "epoch": 3.0,
163
+ "eval_accuracy": 0.9906869727183083,
164
+ "eval_f1": 0.8855799373040752,
165
+ "eval_loss": 0.039505813270807266,
166
+ "eval_precision": 0.8828125,
167
+ "eval_recall": 0.8883647798742138,
168
+ "eval_runtime": 2.8133,
169
+ "eval_samples_per_second": 438.979,
170
+ "eval_steps_per_second": 55.095,
171
+ "step": 1854
172
+ },
173
+ {
174
+ "epoch": 3.074433656957929,
175
+ "grad_norm": 0.027059998363256454,
176
+ "learning_rate": 3.420993481681277e-05,
177
+ "loss": 0.0166,
178
+ "step": 1900
179
+ },
180
+ {
181
+ "epoch": 3.236245954692557,
182
+ "grad_norm": 0.030333412811160088,
183
+ "learning_rate": 3.308608676106991e-05,
184
+ "loss": 0.0174,
185
+ "step": 2000
186
+ },
187
+ {
188
+ "epoch": 3.3980582524271843,
189
+ "grad_norm": 0.13804250955581665,
190
+ "learning_rate": 3.196223870532705e-05,
191
+ "loss": 0.0153,
192
+ "step": 2100
193
+ },
194
+ {
195
+ "epoch": 3.559870550161812,
196
+ "grad_norm": 0.2849176824092865,
197
+ "learning_rate": 3.083839064958418e-05,
198
+ "loss": 0.0152,
199
+ "step": 2200
200
+ },
201
+ {
202
+ "epoch": 3.72168284789644,
203
+ "grad_norm": 0.14825651049613953,
204
+ "learning_rate": 2.971454259384132e-05,
205
+ "loss": 0.0171,
206
+ "step": 2300
207
+ },
208
+ {
209
+ "epoch": 3.883495145631068,
210
+ "grad_norm": 0.045380860567092896,
211
+ "learning_rate": 2.8590694538098453e-05,
212
+ "loss": 0.0255,
213
+ "step": 2400
214
+ },
215
+ {
216
+ "epoch": 4.0,
217
+ "eval_accuracy": 0.9920565355538512,
218
+ "eval_f1": 0.8999999999999999,
219
+ "eval_loss": 0.03599809855222702,
220
+ "eval_precision": 0.9014195583596214,
221
+ "eval_recall": 0.8985849056603774,
222
+ "eval_runtime": 2.8186,
223
+ "eval_samples_per_second": 438.161,
224
+ "eval_steps_per_second": 54.992,
225
+ "step": 2472
226
+ },
227
+ {
228
+ "epoch": 4.0453074433656955,
229
+ "grad_norm": 0.5658661723136902,
230
+ "learning_rate": 2.746684648235559e-05,
231
+ "loss": 0.0228,
232
+ "step": 2500
233
+ },
234
+ {
235
+ "epoch": 4.207119741100324,
236
+ "grad_norm": 0.11415175348520279,
237
+ "learning_rate": 2.6342998426612728e-05,
238
+ "loss": 0.0162,
239
+ "step": 2600
240
+ },
241
+ {
242
+ "epoch": 4.368932038834951,
243
+ "grad_norm": 0.1993759125471115,
244
+ "learning_rate": 2.5219150370869863e-05,
245
+ "loss": 0.0135,
246
+ "step": 2700
247
+ },
248
+ {
249
+ "epoch": 4.53074433656958,
250
+ "grad_norm": 0.11497118324041367,
251
+ "learning_rate": 2.4095302315127e-05,
252
+ "loss": 0.0159,
253
+ "step": 2800
254
+ },
255
+ {
256
+ "epoch": 4.692556634304207,
257
+ "grad_norm": 0.2147281914949417,
258
+ "learning_rate": 2.2971454259384134e-05,
259
+ "loss": 0.0156,
260
+ "step": 2900
261
+ },
262
+ {
263
+ "epoch": 4.854368932038835,
264
+ "grad_norm": 0.1083710715174675,
265
+ "learning_rate": 2.1847606203641273e-05,
266
+ "loss": 0.0094,
267
+ "step": 3000
268
+ },
269
+ {
270
+ "epoch": 5.0,
271
+ "eval_accuracy": 0.9922756656075381,
272
+ "eval_f1": 0.9050980392156862,
273
+ "eval_loss": 0.03369523212313652,
274
+ "eval_precision": 0.9029733959311425,
275
+ "eval_recall": 0.9072327044025157,
276
+ "eval_runtime": 2.8037,
277
+ "eval_samples_per_second": 440.494,
278
+ "eval_steps_per_second": 55.285,
279
+ "step": 3090
280
+ },
281
+ {
282
+ "epoch": 5.016181229773463,
283
+ "grad_norm": 0.013677417300641537,
284
+ "learning_rate": 2.072375814789841e-05,
285
+ "loss": 0.016,
286
+ "step": 3100
287
+ },
288
+ {
289
+ "epoch": 5.17799352750809,
290
+ "grad_norm": 0.08207657188177109,
291
+ "learning_rate": 1.9599910092155544e-05,
292
+ "loss": 0.0133,
293
+ "step": 3200
294
+ },
295
+ {
296
+ "epoch": 5.339805825242719,
297
+ "grad_norm": 0.02103651873767376,
298
+ "learning_rate": 1.847606203641268e-05,
299
+ "loss": 0.0092,
300
+ "step": 3300
301
+ },
302
+ {
303
+ "epoch": 5.501618122977346,
304
+ "grad_norm": 1.4357458353042603,
305
+ "learning_rate": 1.735221398066982e-05,
306
+ "loss": 0.0122,
307
+ "step": 3400
308
+ },
309
+ {
310
+ "epoch": 5.663430420711974,
311
+ "grad_norm": 0.16999904811382294,
312
+ "learning_rate": 1.622836592492695e-05,
313
+ "loss": 0.0086,
314
+ "step": 3500
315
+ },
316
+ {
317
+ "epoch": 5.825242718446602,
318
+ "grad_norm": 0.09043747931718826,
319
+ "learning_rate": 1.510451786918409e-05,
320
+ "loss": 0.0093,
321
+ "step": 3600
322
+ },
323
+ {
324
+ "epoch": 5.9870550161812295,
325
+ "grad_norm": 0.06608462333679199,
326
+ "learning_rate": 1.3980669813441227e-05,
327
+ "loss": 0.0067,
328
+ "step": 3700
329
+ },
330
+ {
331
+ "epoch": 6.0,
332
+ "eval_accuracy": 0.9932617508491289,
333
+ "eval_f1": 0.9301960784313724,
334
+ "eval_loss": 0.033360060304403305,
335
+ "eval_precision": 0.9280125195618153,
336
+ "eval_recall": 0.9323899371069182,
337
+ "eval_runtime": 2.8189,
338
+ "eval_samples_per_second": 438.116,
339
+ "eval_steps_per_second": 54.986,
340
+ "step": 3708
341
+ },
342
+ {
343
+ "epoch": 6.148867313915858,
344
+ "grad_norm": 0.2284722775220871,
345
+ "learning_rate": 1.285682175769836e-05,
346
+ "loss": 0.0107,
347
+ "step": 3800
348
+ },
349
+ {
350
+ "epoch": 6.310679611650485,
351
+ "grad_norm": 0.02673812210559845,
352
+ "learning_rate": 1.1732973701955498e-05,
353
+ "loss": 0.0052,
354
+ "step": 3900
355
+ },
356
+ {
357
+ "epoch": 6.472491909385114,
358
+ "grad_norm": 0.33707210421562195,
359
+ "learning_rate": 1.0609125646212633e-05,
360
+ "loss": 0.0072,
361
+ "step": 4000
362
+ },
363
+ {
364
+ "epoch": 6.634304207119741,
365
+ "grad_norm": 0.0059865182265639305,
366
+ "learning_rate": 9.48527759046977e-06,
367
+ "loss": 0.0049,
368
+ "step": 4100
369
+ },
370
+ {
371
+ "epoch": 6.796116504854369,
372
+ "grad_norm": 0.2759881615638733,
373
+ "learning_rate": 8.361429534726907e-06,
374
+ "loss": 0.016,
375
+ "step": 4200
376
+ },
377
+ {
378
+ "epoch": 6.957928802588997,
379
+ "grad_norm": 0.18257270753383636,
380
+ "learning_rate": 7.237581478984042e-06,
381
+ "loss": 0.0069,
382
+ "step": 4300
383
+ },
384
+ {
385
+ "epoch": 7.0,
386
+ "eval_accuracy": 0.9937000109565027,
387
+ "eval_f1": 0.935483870967742,
388
+ "eval_loss": 0.0347304567694664,
389
+ "eval_precision": 0.9362204724409449,
390
+ "eval_recall": 0.934748427672956,
391
+ "eval_runtime": 2.8106,
392
+ "eval_samples_per_second": 439.402,
393
+ "eval_steps_per_second": 55.148,
394
+ "step": 4326
395
+ },
396
+ {
397
+ "epoch": 7.119741100323624,
398
+ "grad_norm": 0.007623529061675072,
399
+ "learning_rate": 6.113733423241179e-06,
400
+ "loss": 0.0046,
401
+ "step": 4400
402
+ },
403
+ {
404
+ "epoch": 7.281553398058253,
405
+ "grad_norm": 0.043167050927877426,
406
+ "learning_rate": 4.989885367498316e-06,
407
+ "loss": 0.009,
408
+ "step": 4500
409
+ },
410
+ {
411
+ "epoch": 7.44336569579288,
412
+ "grad_norm": 0.009674232453107834,
413
+ "learning_rate": 3.866037311755451e-06,
414
+ "loss": 0.0046,
415
+ "step": 4600
416
+ },
417
+ {
418
+ "epoch": 7.605177993527509,
419
+ "grad_norm": 0.05575043708086014,
420
+ "learning_rate": 2.742189256012588e-06,
421
+ "loss": 0.0052,
422
+ "step": 4700
423
+ },
424
+ {
425
+ "epoch": 7.766990291262136,
426
+ "grad_norm": 0.006715767551213503,
427
+ "learning_rate": 1.6183412002697239e-06,
428
+ "loss": 0.0044,
429
+ "step": 4800
430
+ },
431
+ {
432
+ "epoch": 7.9288025889967635,
433
+ "grad_norm": 0.009280543774366379,
434
+ "learning_rate": 4.9449314452686e-07,
435
+ "loss": 0.0054,
436
+ "step": 4900
437
+ },
438
+ {
439
+ "epoch": 8.0,
440
+ "eval_accuracy": 0.9936726196997918,
441
+ "eval_f1": 0.93401413982718,
442
+ "eval_loss": 0.03279910609126091,
443
+ "eval_precision": 0.9332810047095761,
444
+ "eval_recall": 0.934748427672956,
445
+ "eval_runtime": 2.829,
446
+ "eval_samples_per_second": 436.551,
447
+ "eval_steps_per_second": 54.79,
448
+ "step": 4944
449
+ },
450
+ {
451
+ "epoch": 8.0,
452
+ "step": 4944,
453
+ "total_flos": 2833132740217920.0,
454
+ "train_loss": 0.08807948804957774,
455
+ "train_runtime": 679.3683,
456
+ "train_samples_per_second": 116.355,
457
+ "train_steps_per_second": 7.277
458
+ }
459
+ ],
460
+ "logging_steps": 100,
461
+ "max_steps": 4944,
462
+ "num_input_tokens_seen": 0,
463
+ "num_train_epochs": 8,
464
+ "save_steps": 500,
465
+ "stateful_callbacks": {
466
+ "TrainerControl": {
467
+ "args": {
468
+ "should_epoch_stop": false,
469
+ "should_evaluate": false,
470
+ "should_log": false,
471
+ "should_save": true,
472
+ "should_training_stop": true
473
+ },
474
+ "attributes": {}
475
+ }
476
+ },
477
+ "total_flos": 2833132740217920.0,
478
+ "train_batch_size": 8,
479
+ "trial_name": null,
480
+ "trial_params": null
481
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a7a9b89b841739a3021e1e3a1adf2c68602fa7afc6688bbb60862c85d3c4c5e
3
+ size 5624
vocab.txt ADDED
The diff for this file is too large to render. See raw diff