patrickvonplaten
commited on
Commit
·
c2fd4af
1
Parent(s):
39682b1
Update README.md
Browse files
README.md
CHANGED
@@ -141,21 +141,20 @@ The details of the masking procedure for each sentence are the following:
|
|
141 |
|
142 |
### Pretraining
|
143 |
|
144 |
-
|
145 |
of 256. The sequence length was limited to 512 tokens. The optimizer
|
146 |
used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
|
147 |
learning rate warmup for 10,000 steps and linear decay of the learning rate after.
|
148 |
|
149 |
## Evaluation results
|
150 |
|
151 |
-
|
152 |
|
153 |
-
|
154 |
-
|
155 |
-
| | 72/73 | 83 | 80 | 95 | 69 | 79 | 76 | 63| 76.7 |
|
156 |
|
157 |
-
|
158 |
-
|
159 |
|
160 |
| Task | Metric | Result | | | Training time | |
|
161 |
| ----- | ---------------------- | --------------------------------------------------------------|----------------- | ------------------------------------------------------------------------- | ------------- | -------- |
|
@@ -165,12 +164,12 @@ The following table contains test results on the HuggingFace model in comparison
|
|
165 |
| QNLI | Accuracy | [90.99](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli) | [84.39](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli) | 80 |02:40:22 | 01:48:22 |
|
166 |
| SST-2 | Accuracy | [92.32](https://huggingface.co/gchhablani/bert-base-cased-finetuned-sst2) | [89.45](https://huggingface.co/gchhablani/fnet-base-finetuned-sst2) | 95 | 01:42:17 | 01:09:27 |
|
167 |
| CoLA | Matthews corr or Accuracy | [59.57](https://huggingface.co/gchhablani/bert-base-cased-finetuned-cola) (Matthews corr) | [35.94](https://huggingface.co/gchhablani/fnet-base-finetuned-cola) (Matthews Corr) | 69 (Accuracy) | 14:20 | 09:47 |
|
168 |
-
| STS-B | Spearman corr. | [
|
169 |
| MRPC | mean(F1/Accuracy) | [88.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mrpc) | [81.15](https://huggingface.co/gchhablani/fnet-base-finetuned-mrpc) | 76 |11:12 | 07:48 |
|
170 |
| RTE | Accuracy | [67.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli) | [62.82](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli) | 63 |04:51 | 03:24 |
|
171 |
| WNLI | Accuracy | [46.48](https://huggingface.co/gchhablani/bert-base-cased-finetuned-wnli) | [54.93](https://huggingface.co/gchhablani/fnet-base-finetuned-wnli) | - |03:23 | 02:37 |
|
172 |
|
173 |
-
We can see that
|
174 |
|
175 |
### BibTeX entry and citation info
|
176 |
|
|
|
141 |
|
142 |
### Pretraining
|
143 |
|
144 |
+
FNet-base was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size
|
145 |
of 256. The sequence length was limited to 512 tokens. The optimizer
|
146 |
used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
|
147 |
learning rate warmup for 10,000 steps and linear decay of the learning rate after.
|
148 |
|
149 |
## Evaluation results
|
150 |
|
151 |
+
FNet-base was fine-tuned and evaluated on the validation data of the [GLUE benchamrk](https://huggingface.co/datasets/glue). The results of the official model (written in Flax) can be seen in Table 1 on page 7 of [the official paper](https://arxiv.org/abs/2105.03824).
|
152 |
|
153 |
+
For comparison, this model (ported to PyTorch) was fine-tuned and evaluated using the [official Hugging Face GLUE evaluation scripts](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification#glue-tasks) alongside [bert-base-cased](https://hf.co/models/bert-base-cased) for comparison.
|
154 |
+
The training was done on a single 16GB NVIDIA Tesla V100 GPU. For MRPC/WNLI, the models were trained for 5 epochs, while for other tasks, the models were trained for 3 epochs. A sequence length of 512 was used with batch size 16 and learning rate 2e-5.
|
|
|
155 |
|
156 |
+
The following table summarizes the results for [fnet-base](https://huggingface.co/google/fnet-base) (called *FNet (PyTorch) - Reproduced*) and [bert-base-cased](https://hf.co/models/bert-base-cased) (called *Bert (PyTorch) - Reproduced*) both in terms of performance and training times and compares it to the reported performance of the official FNet-base model (called *FNet (Flax) - Official*).
|
157 |
+
For more details, please refer to the checkpoints linked with the scores. The sequence length used for 512 with batch size 16 and learning rate 2e-5.
|
158 |
|
159 |
| Task | Metric | Result | | | Training time | |
|
160 |
| ----- | ---------------------- | --------------------------------------------------------------|----------------- | ------------------------------------------------------------------------- | ------------- | -------- |
|
|
|
164 |
| QNLI | Accuracy | [90.99](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli) | [84.39](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli) | 80 |02:40:22 | 01:48:22 |
|
165 |
| SST-2 | Accuracy | [92.32](https://huggingface.co/gchhablani/bert-base-cased-finetuned-sst2) | [89.45](https://huggingface.co/gchhablani/fnet-base-finetuned-sst2) | 95 | 01:42:17 | 01:09:27 |
|
166 |
| CoLA | Matthews corr or Accuracy | [59.57](https://huggingface.co/gchhablani/bert-base-cased-finetuned-cola) (Matthews corr) | [35.94](https://huggingface.co/gchhablani/fnet-base-finetuned-cola) (Matthews Corr) | 69 (Accuracy) | 14:20 | 09:47 |
|
167 |
+
| STS-B | Spearman corr. | [88.98](https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb) | [82.19](https://huggingface.co/gchhablani/fnet-base-finetuned-stsb) | 79 |10:24 | 07:09 |
|
168 |
| MRPC | mean(F1/Accuracy) | [88.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mrpc) | [81.15](https://huggingface.co/gchhablani/fnet-base-finetuned-mrpc) | 76 |11:12 | 07:48 |
|
169 |
| RTE | Accuracy | [67.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli) | [62.82](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli) | 63 |04:51 | 03:24 |
|
170 |
| WNLI | Accuracy | [46.48](https://huggingface.co/gchhablani/bert-base-cased-finetuned-wnli) | [54.93](https://huggingface.co/gchhablani/fnet-base-finetuned-wnli) | - |03:23 | 02:37 |
|
171 |
|
172 |
+
We can see that FNet-base achieves around 93% of BERT-base's performance while it requires *ca.* 30% less time to fine-tune on the downstream tasks.
|
173 |
|
174 |
### BibTeX entry and citation info
|
175 |
|