google
/

fnet-base

@@ -152,7 +152,7 @@ According to [the official paper](https://arxiv.org/abs/2105.03824) (*cf.* with
 | Task | MNLI-(m/mm) | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  | Average |
 |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
-|      | 72/73   | 83 | 80 | 95  | 69 | 79  | 76 | 63| 76.7    |
 The following table contains test results on the HuggingFace model in comparison with [bert-base-cased](https://hf.co/models/bert-base-cased). The training was done on a single 16GB NVIDIA Tesla V100 GPU. For MRPC/WNLI, the models were trained for 5 epochs, while for other tasks, the models were trained for 3 epochs. Please refer to the checkpoints linked with the scores. The sequence length used for 512 with batch size 16 and learning rate 2e-5.
@@ -160,15 +160,15 @@ The following table contains test results on the HuggingFace model in comparison
 | Task  | Metric                 | Result                                                        |                  |                                                                           | Training time |          |
 | ----- | ---------------------- | --------------------------------------------------------------|----------------- | ------------------------------------------------------------------------- | ------------- | -------- |
 |       |                        | Bert (PyTorch) - Reproduced                                                                            | FNet (PyTorch) - Reproduced | FNet (Flax) - Official                                                              | Bert          | FNet     |
-| MNLI  | Accuracy               | [84.10](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mnli)       | [76.75](https://huggingface.co/gchhablani/fnet-base-finetuned-mnli)       | 72/73 (Match/Mismatch) | 09:52:33      | 06:40:55 |
-| QQP   | mean(Accuracy,F1)            | [89.26](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qqp)  | [86.5](https://huggingface.co/gchhablani/fnet-base-finetuned-qqp)  |  | 09:25:01      | 06:21:16 |
-| QNLI  | Accuracy               | [90.99](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli)       | [84.39](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli)       |   |02:40:22      | 01:48:22 |
-| SST-2 | Accuracy               | [92.32](https://huggingface.co/gchhablani/bert-base-cased-finetuned-sst2)       | [89.45](https://huggingface.co/gchhablani/fnet-base-finetuned-sst2)       |   | 01:42:17      | 01:09:27 |
-| CoLA  | Matthews corr          | [59.57](https://huggingface.co/gchhablani/bert-base-cased-finetuned-cola)       | [35.94](https://huggingface.co/gchhablani/fnet-base-finetuned-cola)       |   | 14:20         | 09:47    |
-| STS-B | Spearman corr. | [89.26/88.98](https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb) | [82.56/82.19](https://huggingface.co/gchhablani/fnet-base-finetuned-stsb) |   |10:24         | 07:09    |
-| MRPC  | mean(F1/Accuracy)           | [88.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mrpc) | [81.15](https://huggingface.co/gchhablani/fnet-base-finetuned-mrpc) |    |11:12         | 07:48    |
-| RTE   | Accuracy               | [67.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli)       | [62.82](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli)       |    |04:51         | 03:24    |
-| WNLI  | Accuracy               | [46.48](https://huggingface.co/gchhablani/bert-base-cased-finetuned-wnli)       | [54.93](https://huggingface.co/gchhablani/fnet-base-finetuned-wnli)       |    |03:23         | 02:37    |
 We can see that the FNet model achieves around ~93% of BERT's performance on average while it requires on average ~30% less time to fine-tune on the downstream tasks.

 | Task | MNLI-(m/mm) | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  | Average |
 |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
+|      | 72/73        | 83 | 80 | 95  | 69 | 79  | 76 | 63| 76.7    |
 The following table contains test results on the HuggingFace model in comparison with [bert-base-cased](https://hf.co/models/bert-base-cased). The training was done on a single 16GB NVIDIA Tesla V100 GPU. For MRPC/WNLI, the models were trained for 5 epochs, while for other tasks, the models were trained for 3 epochs. Please refer to the checkpoints linked with the scores. The sequence length used for 512 with batch size 16 and learning rate 2e-5.
 | Task  | Metric                 | Result                                                        |                  |                                                                           | Training time |          |
 | ----- | ---------------------- | --------------------------------------------------------------|----------------- | ------------------------------------------------------------------------- | ------------- | -------- |
 |       |                        | Bert (PyTorch) - Reproduced                                                                            | FNet (PyTorch) - Reproduced | FNet (Flax) - Official                                                              | Bert          | FNet     |
+| MNLI  | Accuracy or Match/Mismatch              | [84.10](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mnli) (Accuracy)      | [76.75](https://huggingface.co/gchhablani/fnet-base-finetuned-mnli) (Accuracy)      | 72/73 (Match/Mismatch) | 09:52:33      | 06:40:55 |
+| QQP   | mean(Accuracy,F1)            | [89.26](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qqp)  | [86.5](https://huggingface.co/gchhablani/fnet-base-finetuned-qqp)  | 83 | 09:25:01      | 06:21:16 |
+| QNLI  | Accuracy               | [90.99](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli)       | [84.39](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli)       | 80  |02:40:22      | 01:48:22 |
+| SST-2 | Accuracy               | [92.32](https://huggingface.co/gchhablani/bert-base-cased-finetuned-sst2)       | [89.45](https://huggingface.co/gchhablani/fnet-base-finetuned-sst2)       | 95 | 01:42:17      | 01:09:27 |
+| CoLA  | Matthews corr or Accuracy         | [59.57](https://huggingface.co/gchhablani/bert-base-cased-finetuned-cola) (Matthews corr)     | [35.94](https://huggingface.co/gchhablani/fnet-base-finetuned-cola) (Matthews Corr)      | 69 (Accuracy) | 14:20         | 09:47    |
+| STS-B | Spearman corr. | [89.26/88.98](https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb) | [82.56/82.19](https://huggingface.co/gchhablani/fnet-base-finetuned-stsb) | 79  |10:24         | 07:09    |
+| MRPC  | mean(F1/Accuracy)           | [88.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-mrpc) | [81.15](https://huggingface.co/gchhablani/fnet-base-finetuned-mrpc) |  76 |11:12         | 07:48    |
+| RTE   | Accuracy               | [67.15](https://huggingface.co/gchhablani/bert-base-cased-finetuned-qnli)       | [62.82](https://huggingface.co/gchhablani/fnet-base-finetuned-qnli)       |  63  |04:51         | 03:24    |
+| WNLI  | Accuracy               | [46.48](https://huggingface.co/gchhablani/bert-base-cased-finetuned-wnli)       | [54.93](https://huggingface.co/gchhablani/fnet-base-finetuned-wnli)       |  -  |03:23         | 02:37    |
 We can see that the FNet model achieves around ~93% of BERT's performance on average while it requires on average ~30% less time to fine-tune on the downstream tasks.