wissamantoun commited on
Commit
e23ef04
·
verified ·
1 Parent(s): 45edcf5

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ eval_nbest_predictions.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ tags:
5
+ - roberta
6
+ - question-answering
7
+ base_model: almanach/camembertv2-base
8
+ datasets:
9
+ - FQuAD
10
+ metrics:
11
+ - accuracy
12
+ pipeline_tag: text-classification
13
+ library_name: transformers
14
+ model-index:
15
+ - name: almanach/camembertv2-base-fquad
16
+ results:
17
+ - task:
18
+ type: text-classification
19
+ name: Natural Language Inference
20
+ dataset:
21
+ type: FQuAD
22
+ name: FQuAD
23
+ metrics:
24
+ - name: accuracy
25
+ type: accuracy
26
+ value:
27
+ verified: false
28
+ ---
29
+
30
+ # Model Card for almanach/camembertv2-base-fquad
31
+
32
+ almanach/camembertv2-base-fquad is a roberta model for question answering. It is trained on the FQuAD dataset for the task of Extractive Question Answering. The model achieves an f1-score of 83.03359 on the FQuAD dataset.
33
+
34
+ The model is part of the almanach/camembertv2-base family of model finetunes.
35
+
36
+ ## Model Details
37
+
38
+ ### Model Description
39
+
40
+ - **Developed by:** Wissam Antoun (Phd Student at Almanach, Inria-Paris)
41
+ - **Model type:** roberta
42
+ - **Language(s) (NLP):** French
43
+ - **License:** MIT
44
+ - **Finetuned from model [optional]:** almanach/camembertv2-base
45
+
46
+ ### Model Sources [optional]
47
+
48
+ <!-- Provide the basic links for the model. -->
49
+
50
+ - **Repository:** https://github.com/WissamAntoun/camemberta
51
+ - **Paper:** https://arxiv.org/abs/2411.08868
52
+
53
+ ## Uses
54
+
55
+ The model can be used for question answering tasks in French for Extractive Question Answering.
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ The model may exhibit biases based on the training data. The model may not generalize well to other datasets or tasks. The model may also have limitations in terms of the data it was trained on.
60
+
61
+
62
+ ## How to Get Started with the Model
63
+
64
+ Use the code below to get started with the model.
65
+
66
+ ```python
67
+ from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
68
+
69
+ model = AutoModelForQuestionAnswering.from_pretrained("almanach/camembertv2-base-fquad")
70
+ tokenizer = AutoTokenizer.from_pretrained("almanach/camembertv2-base-fquad")
71
+
72
+ classifier = pipeline("question-answering", model=model, tokenizer=tokenizer)
73
+
74
+ classifier(question="Quelle est la capitale de la France ?", context="La capitale de la France est Paris.")
75
+ ```
76
+
77
+
78
+ ## Training Details
79
+
80
+ ### Training Data
81
+
82
+ The model is trained on the FQuAD dataset.
83
+
84
+ - Dataset Name: FQuAD
85
+ - Dataset Size:
86
+ - Train: 20731
87
+ - Dev: 3188
88
+
89
+
90
+ ### Training Procedure
91
+
92
+ Model trained with the run_qa.py script from the huggingface repository.
93
+
94
+
95
+
96
+ #### Training Hyperparameters
97
+
98
+ ```yml
99
+ 'Unnamed: 0': /scratch/camembertv2/runs/results/fquad/camembertv2-base-bf16-p2-17000/max_seq_length-896-doc_stride-128-max_answer_length-30-gradient_accumulation_steps-4-precision-fp32-learning_rate-5e-06-epochs-6-lr_scheduler-cosine-warmup_steps-0/SEED-25/all_results.json
100
+ accelerator_config: '{''split_batches'': False, ''dispatch_batches'': None, ''even_batches'':
101
+ True, ''use_seedable_sampler'': True, ''non_blocking'': False, ''gradient_accumulation_kwargs'':
102
+ None}'
103
+ adafactor: false
104
+ adam_beta1: 0.9
105
+ adam_beta2: 0.999
106
+ adam_epsilon: 1.0e-08
107
+ auto_find_batch_size: false
108
+ base_model: camembertv2
109
+ base_model_name: camembertv2-base-bf16-p2-17000
110
+ batch_eval_metrics: false
111
+ bf16: false
112
+ bf16_full_eval: false
113
+ data_seed: 25.0
114
+ dataloader_drop_last: false
115
+ dataloader_num_workers: 0
116
+ dataloader_persistent_workers: false
117
+ dataloader_pin_memory: true
118
+ dataloader_prefetch_factor: .nan
119
+ ddp_backend: .nan
120
+ ddp_broadcast_buffers: .nan
121
+ ddp_bucket_cap_mb: .nan
122
+ ddp_find_unused_parameters: .nan
123
+ ddp_timeout: 1800
124
+ debug: '[]'
125
+ deepspeed: .nan
126
+ disable_tqdm: false
127
+ dispatch_batches: .nan
128
+ do_eval: true
129
+ do_predict: false
130
+ do_train: true
131
+ epoch: 6.0
132
+ eval_accumulation_steps: 1
133
+ eval_delay: 0
134
+ eval_do_concat_batches: true
135
+ eval_exact_match: 64.77415307402761
136
+ eval_f1: 83.03359134454834
137
+ eval_on_start: false
138
+ eval_runtime: 6.4215
139
+ eval_samples: 3188.0
140
+ eval_samples_per_second: 496.455
141
+ eval_steps: .nan
142
+ eval_steps_per_second: 7.786
143
+ eval_strategy: epoch
144
+ eval_use_gather_object: false
145
+ evaluation_strategy: epoch
146
+ fp16: false
147
+ fp16_backend: auto
148
+ fp16_full_eval: false
149
+ fp16_opt_level: O1
150
+ fsdp: '[]'
151
+ fsdp_config: '{''min_num_params'': 0, ''xla'': False, ''xla_fsdp_v2'': False, ''xla_fsdp_grad_ckpt'':
152
+ False}'
153
+ fsdp_min_num_params: 0
154
+ fsdp_transformer_layer_cls_to_wrap: .nan
155
+ full_determinism: false
156
+ gradient_accumulation_steps: 4
157
+ gradient_checkpointing: false
158
+ gradient_checkpointing_kwargs: .nan
159
+ greater_is_better: true
160
+ group_by_length: false
161
+ half_precision_backend: auto
162
+ hub_always_push: false
163
+ hub_model_id: .nan
164
+ hub_private_repo: false
165
+ hub_strategy: every_save
166
+ hub_token: <HUB_TOKEN>
167
+ ignore_data_skip: false
168
+ include_inputs_for_metrics: false
169
+ include_num_input_tokens_seen: false
170
+ include_tokens_per_second: false
171
+ jit_mode_eval: false
172
+ label_names: .nan
173
+ label_smoothing_factor: 0.0
174
+ learning_rate: 5.0e-06
175
+ length_column_name: length
176
+ load_best_model_at_end: true
177
+ local_rank: 0
178
+ log_level: debug
179
+ log_level_replica: warning
180
+ log_on_each_node: true
181
+ logging_dir: /scratch/camembertv2/runs/results/fquad/camembertv2-base-bf16-p2-17000/max_seq_length-896-doc_stride-128-max_answer_length-30-gradient_accumulation_steps-4-precision-fp32-learning_rate-5e-06-epochs-6-lr_scheduler-cosine-warmup_steps-0/SEED-25/logs
182
+ logging_first_step: false
183
+ logging_nan_inf_filter: true
184
+ logging_steps: 100
185
+ logging_strategy: steps
186
+ lr_scheduler_kwargs: '{}'
187
+ lr_scheduler_type: cosine
188
+ max_grad_norm: 1.0
189
+ max_steps: -1
190
+ metric_for_best_model: exact_match
191
+ mp_parameters: .nan
192
+ name: camembertv2/runs/results/fquad/camembertv2-base-bf16-p2-17000/max_seq_length-896-doc_stride-128-max_answer_length-30-gradient_accumulation_steps-4-precision-fp32-learning_rate-5e-06-epochs-6-lr_scheduler-cosine-warmup_steps-0
193
+ neftune_noise_alpha: .nan
194
+ no_cuda: false
195
+ num_train_epochs: 6.0
196
+ optim: adamw_torch
197
+ optim_args: .nan
198
+ optim_target_modules: .nan
199
+ output_dir: /scratch/camembertv2/runs/results/fquad/camembertv2-base-bf16-p2-17000/max_seq_length-896-doc_stride-128-max_answer_length-30-gradient_accumulation_steps-4-precision-fp32-learning_rate-5e-06-epochs-6-lr_scheduler-cosine-warmup_steps-0/SEED-25
200
+ overwrite_output_dir: false
201
+ past_index: -1
202
+ per_device_eval_batch_size: 64
203
+ per_device_train_batch_size: 8
204
+ per_gpu_eval_batch_size: .nan
205
+ per_gpu_train_batch_size: .nan
206
+ prediction_loss_only: false
207
+ push_to_hub: false
208
+ push_to_hub_model_id: .nan
209
+ push_to_hub_organization: .nan
210
+ push_to_hub_token: <PUSH_TO_HUB_TOKEN>
211
+ ray_scope: last
212
+ remove_unused_columns: true
213
+ report_to: '[''tensorboard'']'
214
+ restore_callback_states_from_checkpoint: false
215
+ resume_from_checkpoint: .nan
216
+ run_name: camembertv2-base-bf16-p2-17000
217
+ save_on_each_node: false
218
+ save_only_model: false
219
+ save_safetensors: true
220
+ save_steps: 500
221
+ save_strategy: epoch
222
+ save_total_limit: .nan
223
+ seed: 25
224
+ skip_memory_metrics: true
225
+ split_batches: .nan
226
+ tf32: .nan
227
+ torch_compile: true
228
+ torch_compile_backend: inductor
229
+ torch_compile_mode: .nan
230
+ torch_empty_cache_steps: .nan
231
+ torchdynamo: .nan
232
+ total_flos: 2.0387348740618656e+16
233
+ tpu_metrics_debug: false
234
+ tpu_num_cores: .nan
235
+ train_loss: 1.9457146935011624
236
+ train_runtime: 824.1497
237
+ train_samples: 20731
238
+ train_samples_per_second: 150.926
239
+ train_steps_per_second: 4.718
240
+ use_cpu: false
241
+ use_ipex: false
242
+ use_legacy_prediction_loop: false
243
+ use_mps_device: false
244
+ warmup_ratio: 0.0
245
+ warmup_steps: 0
246
+ weight_decay: 0.0
247
+
248
+ ```
249
+
250
+ #### Results
251
+
252
+ **F1-Score:** 83.03359
253
+
254
+ ## Technical Specifications
255
+
256
+ ### Model Architecture and Objective
257
+
258
+ roberta for extractive question answering in French.
259
+
260
+ ## Citation
261
+
262
+ **BibTeX:**
263
+
264
+ ```bibtex
265
+ @misc{antoun2024camembert20smarterfrench,
266
+ title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
267
+ author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
268
+ year={2024},
269
+ eprint={2411.08868},
270
+ archivePrefix={arXiv},
271
+ primaryClass={cs.CL},
272
+ url={https://arxiv.org/abs/2411.08868},
273
+ }
274
+ ```
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.0,
3
+ "eval_exact_match": 64.77415307402761,
4
+ "eval_f1": 83.03359134454834,
5
+ "eval_runtime": 6.4215,
6
+ "eval_samples": 3188,
7
+ "eval_samples_per_second": 496.455,
8
+ "eval_steps_per_second": 7.786,
9
+ "total_flos": 2.0387348740618656e+16,
10
+ "train_loss": 1.9457146935011624,
11
+ "train_runtime": 824.1497,
12
+ "train_samples": 20731,
13
+ "train_samples_per_second": 150.926,
14
+ "train_steps_per_second": 4.718
15
+ }
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/",
3
+ "architectures": [
4
+ "RobertaForQuestionAnswering"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "classifier_dropout": null,
9
+ "embedding_size": 768,
10
+ "eos_token_id": 2,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "layer_norm_eps": 1e-07,
17
+ "max_position_embeddings": 1025,
18
+ "model_name": "camembertv2-base-bf16",
19
+ "model_type": "roberta",
20
+ "num_attention_heads": 12,
21
+ "num_hidden_layers": 12,
22
+ "pad_token_id": 0,
23
+ "position_biased_input": true,
24
+ "position_embedding_type": "absolute",
25
+ "torch_dtype": "float32",
26
+ "transformers_version": "4.44.2",
27
+ "type_vocab_size": 1,
28
+ "use_cache": true,
29
+ "vocab_size": 32768
30
+ }
eval_nbest_predictions.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b6cde6759e30442525bb591dbc94a2b94f99410c2853778d17574cbe2315623
3
+ size 14796790
eval_predictions.json ADDED
The diff for this file is too large to render. See raw diff
 
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.0,
3
+ "eval_exact_match": 64.77415307402761,
4
+ "eval_f1": 83.03359134454834,
5
+ "eval_runtime": 6.4215,
6
+ "eval_samples": 3188,
7
+ "eval_samples_per_second": 496.455,
8
+ "eval_steps_per_second": 7.786
9
+ }
logs/events.out.tfevents.1724462857.nefgpu58.62368.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2efbaac8e471dd36b6ed8192a202a71ae7cee547d127889a17c1b6bb3db19f16
3
+ size 15806
logs/events.out.tfevents.1724463694.nefgpu58.62368.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f43fd2d2b012a03545542eb1359bd162af5aa3d8066fbc758172f96d2972ca0
3
+ size 364
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb311090a5188bbd306d56d693e051458e3c510373cd3d3fce17a17a53b4d567
3
+ size 444069240
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[CLS]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[SEP]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[UNK]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "[MASK]",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "[CLS]",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "[CLS]",
48
+ "eos_token": "[SEP]",
49
+ "errors": "replace",
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 1000000000000000019884624838656,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "tokenizer_class": "RobertaTokenizer",
55
+ "trim_offsets": true,
56
+ "unk_token": "[UNK]"
57
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 6.0,
3
+ "total_flos": 2.0387348740618656e+16,
4
+ "train_loss": 1.9457146935011624,
5
+ "train_runtime": 824.1497,
6
+ "train_samples": 20731,
7
+ "train_samples_per_second": 150.926,
8
+ "train_steps_per_second": 4.718
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,362 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 53.324968632371395,
3
+ "best_model_checkpoint": "/scratch/camembertv2/runs/results/fquad/camembertv2-base-bf16-p2-17000/max_seq_length-896-doc_stride-128-max_answer_length-30-gradient_accumulation_steps-4-precision-fp32-learning_rate-5e-06-epochs-6-lr_scheduler-cosine-warmup_steps-0/SEED-25/checkpoint-3888",
4
+ "epoch": 6.0,
5
+ "eval_steps": 500,
6
+ "global_step": 3888,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.15432098765432098,
13
+ "grad_norm": 4.547507286071777,
14
+ "learning_rate": 4.99184317884152e-06,
15
+ "loss": 5.1604,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.30864197530864196,
20
+ "grad_norm": 12.684767723083496,
21
+ "learning_rate": 4.967425942351207e-06,
22
+ "loss": 4.0839,
23
+ "step": 200
24
+ },
25
+ {
26
+ "epoch": 0.46296296296296297,
27
+ "grad_norm": 14.742673873901367,
28
+ "learning_rate": 4.926907624154051e-06,
29
+ "loss": 3.3159,
30
+ "step": 300
31
+ },
32
+ {
33
+ "epoch": 0.6172839506172839,
34
+ "grad_norm": 12.705907821655273,
35
+ "learning_rate": 4.870552624790192e-06,
36
+ "loss": 2.9494,
37
+ "step": 400
38
+ },
39
+ {
40
+ "epoch": 0.7716049382716049,
41
+ "grad_norm": 14.801329612731934,
42
+ "learning_rate": 4.798728686380588e-06,
43
+ "loss": 2.7635,
44
+ "step": 500
45
+ },
46
+ {
47
+ "epoch": 0.9259259259259259,
48
+ "grad_norm": 17.233285903930664,
49
+ "learning_rate": 4.711904492941644e-06,
50
+ "loss": 2.6393,
51
+ "step": 600
52
+ },
53
+ {
54
+ "epoch": 1.0,
55
+ "eval_exact_match": 38.86449184441656,
56
+ "eval_f1": 60.0036086905889,
57
+ "eval_runtime": 6.9307,
58
+ "eval_samples_per_second": 459.985,
59
+ "eval_steps_per_second": 7.214,
60
+ "step": 648
61
+ },
62
+ {
63
+ "epoch": 1.0802469135802468,
64
+ "grad_norm": 14.701664924621582,
65
+ "learning_rate": 4.610646612007849e-06,
66
+ "loss": 2.4089,
67
+ "step": 700
68
+ },
69
+ {
70
+ "epoch": 1.2345679012345678,
71
+ "grad_norm": 17.278104782104492,
72
+ "learning_rate": 4.495615797519732e-06,
73
+ "loss": 2.3405,
74
+ "step": 800
75
+ },
76
+ {
77
+ "epoch": 1.3888888888888888,
78
+ "grad_norm": 11.50146770477295,
79
+ "learning_rate": 4.367562678102491e-06,
80
+ "loss": 2.2084,
81
+ "step": 900
82
+ },
83
+ {
84
+ "epoch": 1.5432098765432098,
85
+ "grad_norm": 13.203764915466309,
86
+ "learning_rate": 4.22732285887122e-06,
87
+ "loss": 2.1694,
88
+ "step": 1000
89
+ },
90
+ {
91
+ "epoch": 1.6975308641975309,
92
+ "grad_norm": 21.71219825744629,
93
+ "learning_rate": 4.075811468725734e-06,
94
+ "loss": 2.0862,
95
+ "step": 1100
96
+ },
97
+ {
98
+ "epoch": 1.8518518518518519,
99
+ "grad_norm": 12.909610748291016,
100
+ "learning_rate": 3.914017188716347e-06,
101
+ "loss": 2.0016,
102
+ "step": 1200
103
+ },
104
+ {
105
+ "epoch": 2.0,
106
+ "eval_exact_match": 48.745294855708906,
107
+ "eval_f1": 70.05708349304844,
108
+ "eval_runtime": 6.5382,
109
+ "eval_samples_per_second": 487.593,
110
+ "eval_steps_per_second": 7.647,
111
+ "step": 1296
112
+ },
113
+ {
114
+ "epoch": 2.006172839506173,
115
+ "grad_norm": 16.666929244995117,
116
+ "learning_rate": 3.7429958004482575e-06,
117
+ "loss": 1.9412,
118
+ "step": 1300
119
+ },
120
+ {
121
+ "epoch": 2.1604938271604937,
122
+ "grad_norm": 10.462796211242676,
123
+ "learning_rate": 3.5638632966241686e-06,
124
+ "loss": 1.8009,
125
+ "step": 1400
126
+ },
127
+ {
128
+ "epoch": 2.314814814814815,
129
+ "grad_norm": 13.769060134887695,
130
+ "learning_rate": 3.3777885986819725e-06,
131
+ "loss": 1.7928,
132
+ "step": 1500
133
+ },
134
+ {
135
+ "epoch": 2.4691358024691357,
136
+ "grad_norm": 15.287083625793457,
137
+ "learning_rate": 3.1859859290482544e-06,
138
+ "loss": 1.7865,
139
+ "step": 1600
140
+ },
141
+ {
142
+ "epoch": 2.623456790123457,
143
+ "grad_norm": 11.451101303100586,
144
+ "learning_rate": 2.989706887782151e-06,
145
+ "loss": 1.7489,
146
+ "step": 1700
147
+ },
148
+ {
149
+ "epoch": 2.7777777777777777,
150
+ "grad_norm": 17.512975692749023,
151
+ "learning_rate": 2.7902322853130758e-06,
152
+ "loss": 1.6978,
153
+ "step": 1800
154
+ },
155
+ {
156
+ "epoch": 2.932098765432099,
157
+ "grad_norm": 17.15248680114746,
158
+ "learning_rate": 2.5888637845674276e-06,
159
+ "loss": 1.6566,
160
+ "step": 1900
161
+ },
162
+ {
163
+ "epoch": 3.0,
164
+ "eval_exact_match": 50.47051442910916,
165
+ "eval_f1": 72.25048266378954,
166
+ "eval_runtime": 6.5447,
167
+ "eval_samples_per_second": 487.112,
168
+ "eval_steps_per_second": 7.64,
169
+ "step": 1944
170
+ },
171
+ {
172
+ "epoch": 3.0864197530864197,
173
+ "grad_norm": 11.384383201599121,
174
+ "learning_rate": 2.3869154070232346e-06,
175
+ "loss": 1.6309,
176
+ "step": 2000
177
+ },
178
+ {
179
+ "epoch": 3.240740740740741,
180
+ "grad_norm": 14.492201805114746,
181
+ "learning_rate": 2.185704958119594e-06,
182
+ "loss": 1.5353,
183
+ "step": 2100
184
+ },
185
+ {
186
+ "epoch": 3.3950617283950617,
187
+ "grad_norm": 15.613585472106934,
188
+ "learning_rate": 1.9865454279740452e-06,
189
+ "loss": 1.5249,
190
+ "step": 2200
191
+ },
192
+ {
193
+ "epoch": 3.549382716049383,
194
+ "grad_norm": 12.233988761901855,
195
+ "learning_rate": 1.7907364235221128e-06,
196
+ "loss": 1.5499,
197
+ "step": 2300
198
+ },
199
+ {
200
+ "epoch": 3.7037037037037037,
201
+ "grad_norm": 11.811338424682617,
202
+ "learning_rate": 1.5995556879882246e-06,
203
+ "loss": 1.5159,
204
+ "step": 2400
205
+ },
206
+ {
207
+ "epoch": 3.8580246913580245,
208
+ "grad_norm": 18.379695892333984,
209
+ "learning_rate": 1.414250763027336e-06,
210
+ "loss": 1.5072,
211
+ "step": 2500
212
+ },
213
+ {
214
+ "epoch": 4.0,
215
+ "eval_exact_match": 53.01129234629862,
216
+ "eval_f1": 74.3205610049545,
217
+ "eval_runtime": 6.5679,
218
+ "eval_samples_per_second": 485.39,
219
+ "eval_steps_per_second": 7.613,
220
+ "step": 2592
221
+ },
222
+ {
223
+ "epoch": 4.012345679012346,
224
+ "grad_norm": 12.669611930847168,
225
+ "learning_rate": 1.2360308479456027e-06,
226
+ "loss": 1.5257,
227
+ "step": 2600
228
+ },
229
+ {
230
+ "epoch": 4.166666666666667,
231
+ "grad_norm": 13.753548622131348,
232
+ "learning_rate": 1.0660589091223854e-06,
233
+ "loss": 1.4296,
234
+ "step": 2700
235
+ },
236
+ {
237
+ "epoch": 4.320987654320987,
238
+ "grad_norm": 10.774425506591797,
239
+ "learning_rate": 9.054440911232348e-07,
240
+ "loss": 1.4796,
241
+ "step": 2800
242
+ },
243
+ {
244
+ "epoch": 4.4753086419753085,
245
+ "grad_norm": 16.21649742126465,
246
+ "learning_rate": 7.552344790248104e-07,
247
+ "loss": 1.426,
248
+ "step": 2900
249
+ },
250
+ {
251
+ "epoch": 4.62962962962963,
252
+ "grad_norm": 11.01417064666748,
253
+ "learning_rate": 6.164102591808482e-07,
254
+ "loss": 1.4245,
255
+ "step": 3000
256
+ },
257
+ {
258
+ "epoch": 4.783950617283951,
259
+ "grad_norm": 10.40230941772461,
260
+ "learning_rate": 4.898773230583353e-07,
261
+ "loss": 1.4493,
262
+ "step": 3100
263
+ },
264
+ {
265
+ "epoch": 4.938271604938271,
266
+ "grad_norm": 10.953381538391113,
267
+ "learning_rate": 3.7646135588175676e-07,
268
+ "loss": 1.404,
269
+ "step": 3200
270
+ },
271
+ {
272
+ "epoch": 5.0,
273
+ "eval_exact_match": 53.168130489335006,
274
+ "eval_f1": 74.39491719320372,
275
+ "eval_runtime": 6.6406,
276
+ "eval_samples_per_second": 480.08,
277
+ "eval_steps_per_second": 7.529,
278
+ "step": 3240
279
+ },
280
+ {
281
+ "epoch": 5.092592592592593,
282
+ "grad_norm": 13.173111915588379,
283
+ "learning_rate": 2.7690244865973494e-07,
284
+ "loss": 1.43,
285
+ "step": 3300
286
+ },
287
+ {
288
+ "epoch": 5.246913580246914,
289
+ "grad_norm": 13.998867988586426,
290
+ "learning_rate": 1.918502687530241e-07,
291
+ "loss": 1.3968,
292
+ "step": 3400
293
+ },
294
+ {
295
+ "epoch": 5.401234567901234,
296
+ "grad_norm": 12.186470985412598,
297
+ "learning_rate": 1.2185982049813472e-07,
298
+ "loss": 1.378,
299
+ "step": 3500
300
+ },
301
+ {
302
+ "epoch": 5.555555555555555,
303
+ "grad_norm": 16.22747039794922,
304
+ "learning_rate": 6.738782355044048e-08,
305
+ "loss": 1.4347,
306
+ "step": 3600
307
+ },
308
+ {
309
+ "epoch": 5.709876543209877,
310
+ "grad_norm": 17.75710105895996,
311
+ "learning_rate": 2.878973257973955e-08,
312
+ "loss": 1.422,
313
+ "step": 3700
314
+ },
315
+ {
316
+ "epoch": 5.864197530864198,
317
+ "grad_norm": 17.476356506347656,
318
+ "learning_rate": 6.317417766116829e-09,
319
+ "loss": 1.3868,
320
+ "step": 3800
321
+ },
322
+ {
323
+ "epoch": 6.0,
324
+ "eval_exact_match": 53.324968632371395,
325
+ "eval_f1": 74.54839090269344,
326
+ "eval_runtime": 6.5292,
327
+ "eval_samples_per_second": 488.268,
328
+ "eval_steps_per_second": 7.658,
329
+ "step": 3888
330
+ },
331
+ {
332
+ "epoch": 6.0,
333
+ "step": 3888,
334
+ "total_flos": 2.0387348740618656e+16,
335
+ "train_loss": 1.9457146935011624,
336
+ "train_runtime": 824.1497,
337
+ "train_samples_per_second": 150.926,
338
+ "train_steps_per_second": 4.718
339
+ }
340
+ ],
341
+ "logging_steps": 100,
342
+ "max_steps": 3888,
343
+ "num_input_tokens_seen": 0,
344
+ "num_train_epochs": 6,
345
+ "save_steps": 500,
346
+ "stateful_callbacks": {
347
+ "TrainerControl": {
348
+ "args": {
349
+ "should_epoch_stop": false,
350
+ "should_evaluate": false,
351
+ "should_log": false,
352
+ "should_save": true,
353
+ "should_training_stop": true
354
+ },
355
+ "attributes": {}
356
+ }
357
+ },
358
+ "total_flos": 2.0387348740618656e+16,
359
+ "train_batch_size": 8,
360
+ "trial_name": null,
361
+ "trial_params": null
362
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf43bebeeef721ccd0e0d0f58cdae5570d7c7f63c5cd3695b03168815aa747f0
3
+ size 5688
vocab.txt ADDED
The diff for this file is too large to render. See raw diff