muhtasham commited on
Commit
2f3d561
Β·
1 Parent(s): bd7b6e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -13,7 +13,8 @@ license: apache-2.0
13
  # Tiny BERT December 2022
14
 
15
  This is a more up-to-date version of the [original tiny BERT](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
16
- In addition to being more up-to-date, it is more CPU friendly than its base version, but its first version and is not perfect by no means.
 
17
 
18
 
19
  The model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia.
@@ -45,8 +46,8 @@ OLM
45
  65825874694874, 'qnli_acc': 0.6199890170236134, 'rte_acc': 0.5595667870036101, 'wnli_acc': 0.5352112676056338}
46
  ```
47
 
48
- Probably messed up with hyperparameters and tokenizer a bit, unfortunately. Stay tuned for version 2 πŸš€πŸš€πŸš€
49
-
50
 
51
  ## Dataset
52
 
 
13
  # Tiny BERT December 2022
14
 
15
  This is a more up-to-date version of the [original tiny BERT](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
16
+ In addition to being more up-to-date, it is more CPU friendly than its base version, but its first version and is not perfect by no means. Took a day and 8x A100s to train. πŸ€—
17
+
18
 
19
 
20
  The model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia.
 
46
  65825874694874, 'qnli_acc': 0.6199890170236134, 'rte_acc': 0.5595667870036101, 'wnli_acc': 0.5352112676056338}
47
  ```
48
 
49
+ Probably messed up with hyperparameters and tokenizer a bit, unfortunately. Anyway Stay tuned for version 2 πŸš€πŸš€πŸš€
50
+ But please try it out on your downstream tasks, might be more performant. Should be cheap to fine-tune due to its size πŸ€—
51
 
52
  ## Dataset
53