yoshitomo-matsubara
commited on
Commit
·
eca7edf
1
Parent(s):
6d02f69
initial commit
Browse files- README.md +20 -0
- config.json +36 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +1 -0
- tokenizer.json +0 -0
- tokenizer_config.json +1 -0
- training.log +88 -0
- vocab.txt +0 -0
README.md
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
tags:
|
4 |
+
- bert
|
5 |
+
- mnli
|
6 |
+
- ax
|
7 |
+
- glue
|
8 |
+
- kd
|
9 |
+
- torchdistill
|
10 |
+
license: apache-2.0
|
11 |
+
datasets:
|
12 |
+
- mnli
|
13 |
+
- ax
|
14 |
+
metrics:
|
15 |
+
- accuracy
|
16 |
+
---
|
17 |
+
|
18 |
+
`bert-base-uncased` fine-tuned on MNLI dataset, using fine-tuned `bert-large-uncased` as a teacher model, [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) and [Google Colab](https://colab.research.google.com/github/yoshitomo-matsubara/torchdistill/blob/master/demo/glue_kd_and_submission.ipynb) for knowledge distillation.
|
19 |
+
The training configuration (including hyperparameters) is available [here](https://github.com/yoshitomo-matsubara/torchdistill/blob/main/configs/sample/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.yaml).
|
20 |
+
I submitted prediction files to [the GLUE leaderboard](https://gluebenchmark.com/leaderboard), and the overall GLUE score was **78.9**.
|
config.json
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "bert-base-uncased",
|
3 |
+
"architectures": [
|
4 |
+
"BertForSequenceClassification"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"finetuning_task": "mnli",
|
8 |
+
"gradient_checkpointing": false,
|
9 |
+
"hidden_act": "gelu",
|
10 |
+
"hidden_dropout_prob": 0.1,
|
11 |
+
"hidden_size": 768,
|
12 |
+
"id2label": {
|
13 |
+
"0": "LABEL_0",
|
14 |
+
"1": "LABEL_1",
|
15 |
+
"2": "LABEL_2"
|
16 |
+
},
|
17 |
+
"initializer_range": 0.02,
|
18 |
+
"intermediate_size": 3072,
|
19 |
+
"label2id": {
|
20 |
+
"LABEL_0": 0,
|
21 |
+
"LABEL_1": 1,
|
22 |
+
"LABEL_2": 2
|
23 |
+
},
|
24 |
+
"layer_norm_eps": 1e-12,
|
25 |
+
"max_position_embeddings": 512,
|
26 |
+
"model_type": "bert",
|
27 |
+
"num_attention_heads": 12,
|
28 |
+
"num_hidden_layers": 12,
|
29 |
+
"pad_token_id": 0,
|
30 |
+
"position_embedding_type": "absolute",
|
31 |
+
"problem_type": "single_label_classification",
|
32 |
+
"transformers_version": "4.6.1",
|
33 |
+
"type_vocab_size": 2,
|
34 |
+
"use_cache": true,
|
35 |
+
"vocab_size": 30522
|
36 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ca0f84905c30911b5d2640e1c54998e5870c550466984e9af1094a0a84f9fcb7
|
3 |
+
size 438027529
|
special_tokens_map.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "do_lower": true, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "bert-base-uncased"}
|
training.log
ADDED
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2021-05-31 19:12:19,502 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
|
2 |
+
2021-05-31 19:12:19,563 INFO __main__ Distributed environment: NO
|
3 |
+
Num processes: 1
|
4 |
+
Process index: 0
|
5 |
+
Local process index: 0
|
6 |
+
Device: cuda
|
7 |
+
Use FP16 precision: True
|
8 |
+
|
9 |
+
2021-05-31 19:12:19,941 INFO filelock Lock 140082792337040 acquired on /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09.lock
|
10 |
+
2021-05-31 19:12:20,295 INFO filelock Lock 140082792337040 released on /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09.lock
|
11 |
+
2021-05-31 19:12:21,006 INFO filelock Lock 140082831894224 acquired on /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
|
12 |
+
2021-05-31 19:12:21,516 INFO filelock Lock 140082831894224 released on /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
|
13 |
+
2021-05-31 19:12:21,871 INFO filelock Lock 140082823814352 acquired on /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock
|
14 |
+
2021-05-31 19:12:22,393 INFO filelock Lock 140082823814352 released on /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock
|
15 |
+
2021-05-31 19:12:23,095 INFO filelock Lock 140082823814352 acquired on /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock
|
16 |
+
2021-05-31 19:12:23,448 INFO filelock Lock 140082823814352 released on /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock
|
17 |
+
2021-05-31 19:12:23,803 INFO filelock Lock 140082823814352 acquired on /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock
|
18 |
+
2021-05-31 19:12:24,158 INFO filelock Lock 140082823814352 released on /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock
|
19 |
+
2021-05-31 19:12:24,537 INFO filelock Lock 140082823814992 acquired on /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa.lock
|
20 |
+
2021-05-31 19:13:00,303 INFO filelock Lock 140082823814992 released on /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa.lock
|
21 |
+
2021-05-31 19:14:53,610 INFO __main__ Start training
|
22 |
+
2021-05-31 19:14:53,610 INFO torchdistill.models.util [teacher model]
|
23 |
+
2021-05-31 19:14:53,610 INFO torchdistill.models.util Using the original teacher model
|
24 |
+
2021-05-31 19:14:53,610 INFO torchdistill.models.util [student model]
|
25 |
+
2021-05-31 19:14:53,611 INFO torchdistill.models.util Using the original student model
|
26 |
+
2021-05-31 19:14:53,611 INFO torchdistill.core.distillation Loss = 1.0 * OrgLoss
|
27 |
+
2021-05-31 19:14:53,611 INFO torchdistill.core.distillation Freezing the whole teacher model
|
28 |
+
2021-05-31 19:14:58,197 INFO torchdistill.misc.log Epoch: [0] [ 0/12272] eta: 0:26:53 lr: 9.999728378965668e-05 sample/s: 38.52969437529281 loss: 0.0905 (0.0905) time: 0.1315 data: 0.0277 max mem: 2519
|
29 |
+
2021-05-31 19:17:04,033 INFO torchdistill.misc.log Epoch: [0] [ 1000/12272] eta: 0:23:38 lr: 9.728107344632768e-05 sample/s: 25.678521357422294 loss: 0.0229 (0.0347) time: 0.1315 data: 0.0046 max mem: 5109
|
30 |
+
2021-05-31 19:19:10,890 INFO torchdistill.misc.log Epoch: [0] [ 2000/12272] eta: 0:21:37 lr: 9.45648631029987e-05 sample/s: 33.98564182345601 loss: 0.0153 (0.0267) time: 0.1355 data: 0.0044 max mem: 5109
|
31 |
+
2021-05-31 19:21:17,630 INFO torchdistill.misc.log Epoch: [0] [ 3000/12272] eta: 0:19:32 lr: 9.184865275966971e-05 sample/s: 30.293297895006734 loss: 0.0145 (0.0230) time: 0.1215 data: 0.0044 max mem: 5109
|
32 |
+
2021-05-31 19:23:24,094 INFO torchdistill.misc.log Epoch: [0] [ 4000/12272] eta: 0:17:26 lr: 8.913244241634072e-05 sample/s: 39.35939116542367 loss: 0.0144 (0.0208) time: 0.1229 data: 0.0045 max mem: 5109
|
33 |
+
2021-05-31 19:25:30,963 INFO torchdistill.misc.log Epoch: [0] [ 5000/12272] eta: 0:15:20 lr: 8.641623207301173e-05 sample/s: 31.891785647075373 loss: 0.0108 (0.0192) time: 0.1368 data: 0.0047 max mem: 5109
|
34 |
+
2021-05-31 19:27:37,490 INFO torchdistill.misc.log Epoch: [0] [ 6000/12272] eta: 0:13:13 lr: 8.370002172968275e-05 sample/s: 30.313604538761055 loss: 0.0109 (0.0179) time: 0.1267 data: 0.0047 max mem: 5109
|
35 |
+
2021-05-31 19:29:45,181 INFO torchdistill.misc.log Epoch: [0] [ 7000/12272] eta: 0:11:08 lr: 8.098381138635376e-05 sample/s: 42.336344641721595 loss: 0.0095 (0.0170) time: 0.1268 data: 0.0045 max mem: 5109
|
36 |
+
2021-05-31 19:31:52,182 INFO torchdistill.misc.log Epoch: [0] [ 8000/12272] eta: 0:09:01 lr: 7.826760104302477e-05 sample/s: 31.78104944118204 loss: 0.0112 (0.0162) time: 0.1264 data: 0.0046 max mem: 5109
|
37 |
+
2021-05-31 19:33:59,788 INFO torchdistill.misc.log Epoch: [0] [ 9000/12272] eta: 0:06:55 lr: 7.555139069969579e-05 sample/s: 30.615916348838482 loss: 0.0089 (0.0155) time: 0.1314 data: 0.0045 max mem: 5109
|
38 |
+
2021-05-31 19:36:07,595 INFO torchdistill.misc.log Epoch: [0] [10000/12272] eta: 0:04:48 lr: 7.283518035636681e-05 sample/s: 37.10492838754766 loss: 0.0072 (0.0149) time: 0.1298 data: 0.0048 max mem: 5109
|
39 |
+
2021-05-31 19:38:13,949 INFO torchdistill.misc.log Epoch: [0] [11000/12272] eta: 0:02:41 lr: 7.011897001303781e-05 sample/s: 32.78477658489305 loss: 0.0090 (0.0144) time: 0.1288 data: 0.0045 max mem: 5109
|
40 |
+
2021-05-31 19:40:21,535 INFO torchdistill.misc.log Epoch: [0] [12000/12272] eta: 0:00:34 lr: 6.740275966970883e-05 sample/s: 37.34245014245014 loss: 0.0079 (0.0140) time: 0.1317 data: 0.0050 max mem: 5109
|
41 |
+
2021-05-31 19:40:56,676 INFO torchdistill.misc.log Epoch: [0] Total time: 0:25:58
|
42 |
+
2021-05-31 19:41:04,501 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
|
43 |
+
2021-05-31 19:41:04,501 INFO __main__ Validation: accuracy = 0.8412633723892002
|
44 |
+
2021-05-31 19:41:04,501 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased
|
45 |
+
2021-05-31 19:41:05,722 INFO torchdistill.misc.log Epoch: [1] [ 0/12272] eta: 0:31:19 lr: 6.666395045632334e-05 sample/s: 31.762036742544716 loss: 0.0031 (0.0031) time: 0.1532 data: 0.0272 max mem: 5109
|
46 |
+
2021-05-31 19:43:13,358 INFO torchdistill.misc.log Epoch: [1] [ 1000/12272] eta: 0:23:58 lr: 6.394774011299436e-05 sample/s: 37.225953324487556 loss: 0.0044 (0.0051) time: 0.1300 data: 0.0046 max mem: 5109
|
47 |
+
2021-05-31 19:45:20,181 INFO torchdistill.misc.log Epoch: [1] [ 2000/12272] eta: 0:21:47 lr: 6.123152976966536e-05 sample/s: 37.15834562552879 loss: 0.0047 (0.0051) time: 0.1284 data: 0.0045 max mem: 5109
|
48 |
+
2021-05-31 19:47:26,919 INFO torchdistill.misc.log Epoch: [1] [ 3000/12272] eta: 0:19:38 lr: 5.851531942633638e-05 sample/s: 39.3462836451305 loss: 0.0042 (0.0050) time: 0.1197 data: 0.0043 max mem: 5109
|
49 |
+
2021-05-31 19:49:32,833 INFO torchdistill.misc.log Epoch: [1] [ 4000/12272] eta: 0:17:28 lr: 5.5799109083007396e-05 sample/s: 33.65857162059412 loss: 0.0040 (0.0050) time: 0.1264 data: 0.0043 max mem: 5109
|
50 |
+
2021-05-31 19:51:40,796 INFO torchdistill.misc.log Epoch: [1] [ 5000/12272] eta: 0:15:23 lr: 5.30828987396784e-05 sample/s: 26.070806883959442 loss: 0.0046 (0.0050) time: 0.1288 data: 0.0045 max mem: 5109
|
51 |
+
2021-05-31 19:53:48,528 INFO torchdistill.misc.log Epoch: [1] [ 6000/12272] eta: 0:13:17 lr: 5.036668839634942e-05 sample/s: 32.30201815220279 loss: 0.0045 (0.0049) time: 0.1212 data: 0.0044 max mem: 5109
|
52 |
+
2021-05-31 19:55:53,950 INFO torchdistill.misc.log Epoch: [1] [ 7000/12272] eta: 0:11:08 lr: 4.765047805302043e-05 sample/s: 36.80166358838471 loss: 0.0038 (0.0049) time: 0.1297 data: 0.0044 max mem: 5109
|
53 |
+
2021-05-31 19:57:59,848 INFO torchdistill.misc.log Epoch: [1] [ 8000/12272] eta: 0:09:01 lr: 4.493426770969144e-05 sample/s: 33.594812965184154 loss: 0.0041 (0.0048) time: 0.1258 data: 0.0044 max mem: 5109
|
54 |
+
2021-05-31 20:00:06,135 INFO torchdistill.misc.log Epoch: [1] [ 9000/12272] eta: 0:06:54 lr: 4.221805736636245e-05 sample/s: 25.64719622596514 loss: 0.0045 (0.0048) time: 0.1241 data: 0.0046 max mem: 5109
|
55 |
+
2021-05-31 20:02:14,011 INFO torchdistill.misc.log Epoch: [1] [10000/12272] eta: 0:04:48 lr: 3.9501847023033466e-05 sample/s: 33.08476073658346 loss: 0.0043 (0.0048) time: 0.1239 data: 0.0047 max mem: 5109
|
56 |
+
2021-05-31 20:04:21,426 INFO torchdistill.misc.log Epoch: [1] [11000/12272] eta: 0:02:41 lr: 3.6785636679704476e-05 sample/s: 25.415942288056765 loss: 0.0039 (0.0047) time: 0.1303 data: 0.0045 max mem: 5109
|
57 |
+
2021-05-31 20:06:28,294 INFO torchdistill.misc.log Epoch: [1] [12000/12272] eta: 0:00:34 lr: 3.406942633637549e-05 sample/s: 37.492242198062506 loss: 0.0038 (0.0047) time: 0.1308 data: 0.0051 max mem: 5109
|
58 |
+
2021-05-31 20:07:02,616 INFO torchdistill.misc.log Epoch: [1] Total time: 0:25:57
|
59 |
+
2021-05-31 20:07:10,347 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
|
60 |
+
2021-05-31 20:07:10,348 INFO __main__ Validation: accuracy = 0.8530820173204279
|
61 |
+
2021-05-31 20:07:10,348 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased
|
62 |
+
2021-05-31 20:07:11,549 INFO torchdistill.misc.log Epoch: [2] [ 0/12272] eta: 0:27:31 lr: 3.3330617122990006e-05 sample/s: 36.21437806918137 loss: 0.0018 (0.0018) time: 0.1346 data: 0.0241 max mem: 5109
|
63 |
+
2021-05-31 20:09:18,600 INFO torchdistill.misc.log Epoch: [2] [ 1000/12272] eta: 0:23:52 lr: 3.061440677966102e-05 sample/s: 37.12356586986009 loss: 0.0023 (0.0024) time: 0.1341 data: 0.0045 max mem: 5109
|
64 |
+
2021-05-31 20:11:24,788 INFO torchdistill.misc.log Epoch: [2] [ 2000/12272] eta: 0:21:40 lr: 2.789819643633203e-05 sample/s: 32.69698623302515 loss: 0.0021 (0.0023) time: 0.1271 data: 0.0046 max mem: 5109
|
65 |
+
2021-05-31 20:13:32,260 INFO torchdistill.misc.log Epoch: [2] [ 3000/12272] eta: 0:19:36 lr: 2.5181986093003048e-05 sample/s: 39.77255238497116 loss: 0.0019 (0.0023) time: 0.1264 data: 0.0047 max mem: 5109
|
66 |
+
2021-05-31 20:15:38,928 INFO torchdistill.misc.log Epoch: [2] [ 4000/12272] eta: 0:17:29 lr: 2.2465775749674055e-05 sample/s: 37.40997928507879 loss: 0.0019 (0.0023) time: 0.1170 data: 0.0045 max mem: 5109
|
67 |
+
2021-05-31 20:17:46,151 INFO torchdistill.misc.log Epoch: [2] [ 5000/12272] eta: 0:15:22 lr: 1.974956540634507e-05 sample/s: 39.26708624043028 loss: 0.0018 (0.0023) time: 0.1322 data: 0.0045 max mem: 5109
|
68 |
+
2021-05-31 20:19:53,077 INFO torchdistill.misc.log Epoch: [2] [ 6000/12272] eta: 0:13:16 lr: 1.7033355063016082e-05 sample/s: 26.89458075644343 loss: 0.0019 (0.0022) time: 0.1324 data: 0.0045 max mem: 5109
|
69 |
+
2021-05-31 20:21:59,132 INFO torchdistill.misc.log Epoch: [2] [ 7000/12272] eta: 0:11:08 lr: 1.4317144719687093e-05 sample/s: 32.304879269842495 loss: 0.0017 (0.0022) time: 0.1225 data: 0.0044 max mem: 5109
|
70 |
+
2021-05-31 20:24:05,638 INFO torchdistill.misc.log Epoch: [2] [ 8000/12272] eta: 0:09:01 lr: 1.1600934376358105e-05 sample/s: 39.57945395824831 loss: 0.0021 (0.0022) time: 0.1263 data: 0.0044 max mem: 5109
|
71 |
+
2021-05-31 20:26:11,594 INFO torchdistill.misc.log Epoch: [2] [ 9000/12272] eta: 0:06:54 lr: 8.884724033029119e-06 sample/s: 25.860594738237488 loss: 0.0023 (0.0022) time: 0.1262 data: 0.0044 max mem: 5109
|
72 |
+
2021-05-31 20:28:18,549 INFO torchdistill.misc.log Epoch: [2] [10000/12272] eta: 0:04:47 lr: 6.168513689700131e-06 sample/s: 32.40314814635983 loss: 0.0019 (0.0022) time: 0.1260 data: 0.0045 max mem: 5109
|
73 |
+
2021-05-31 20:30:24,951 INFO torchdistill.misc.log Epoch: [2] [11000/12272] eta: 0:02:41 lr: 3.452303346371143e-06 sample/s: 42.08254363214055 loss: 0.0021 (0.0022) time: 0.1241 data: 0.0044 max mem: 5109
|
74 |
+
2021-05-31 20:32:31,971 INFO torchdistill.misc.log Epoch: [2] [12000/12272] eta: 0:00:34 lr: 7.360930030421556e-07 sample/s: 33.625651129091416 loss: 0.0019 (0.0022) time: 0.1307 data: 0.0044 max mem: 5109
|
75 |
+
2021-05-31 20:33:06,083 INFO torchdistill.misc.log Epoch: [2] Total time: 0:25:54
|
76 |
+
2021-05-31 20:33:13,819 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
|
77 |
+
2021-05-31 20:33:13,820 INFO __main__ Validation: accuracy = 0.8582781456953642
|
78 |
+
2021-05-31 20:33:13,820 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased
|
79 |
+
2021-05-31 20:33:15,094 INFO __main__ [Teacher: bert-large-uncased]
|
80 |
+
2021-05-31 20:33:28,908 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
|
81 |
+
2021-05-31 20:33:28,908 INFO __main__ Test: accuracy = 0.8665308201732043
|
82 |
+
2021-05-31 20:33:32,568 INFO __main__ [Student: bert-base-uncased]
|
83 |
+
2021-05-31 20:33:40,325 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
|
84 |
+
2021-05-31 20:33:40,326 INFO __main__ Test: accuracy = 0.8582781456953642
|
85 |
+
2021-05-31 20:33:40,326 INFO __main__ Start prediction for private dataset(s)
|
86 |
+
2021-05-31 20:33:40,327 INFO __main__ mnli/test_m: 9796 samples
|
87 |
+
2021-05-31 20:33:47,980 INFO __main__ mnli/test_mm: 9847 samples
|
88 |
+
2021-05-31 20:33:55,598 INFO __main__ ax/test_ax: 1104 samples
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|