Safetensors
English
bert
fineweb-lms
token-dropping

FineWeb-LMs: Token Dropping BERT

BERT with TensorFlow Model Garden

This repository presents a Token Dropping BERT model that was pretrained on the 10BT subsets of FineWeb and FineWeb-Edu.

Pretraining Details

The released BERT model is part of my TensorFlow Model Garden LMs project.

The pretraining was done on a v3-32 TPU VM Pod, provided by the amazing TRC program. Detailed cheatsheets are available:

tl;dr: The model was pretrained for 1M steps with a global batch size of 512, a sequence length of 512 using a vocab size of 64k.

Checkpoint Evaluation with ScandEval

We evaluate the last 5 checkpoints (1M, 951k, 901k, 851k and 851k) with a recent version of ScandEval to check their performance and also compare it with popular encoder-only models such as BERT, RoBERTa or ELECTRA:

Model ID Avg. Score CoNLL-En SST5 ScaLA-En SQuAD
model-garden-lms/bert-base-token-dropping-finewebs-1m 67.66 88.68 ± 0.76 / 88.47 ± 0.62 57.4 ± 1.7 / 59.61 ± 1.6 52.72 ± 5.13 / 73.6 ± 4.42 55.04 ± 1.54 / 65.72 ± 1.75
model-garden-lms/bert-base-token-dropping-finewebs-951k 66.87 88.81 ± 0.68 / 88.64 ± 0.54 57.44 ± 1.39 / 56.85 ± 2.09 50.91 ± 5.08 / 72.22 ± 4.2 54.63 ± 1.3 / 65.43 ± 1.43
model-garden-lms/bert-base-token-dropping-finewebs-901k 68.01 88.98 ± 0.64 / 88.67 ± 0.55 57.79 ± 1.31 / 58.91 ± 1.85 54.25 ± 6.3 / 75.73 ± 3.54 54.4 ± 0.72 / 65.31 ± 1.01
model-garden-lms/bert-base-token-dropping-finewebs-851k 67.97 88.9 ± 0.7 / 88.81 ± 0.54 58.0 ± 1.02 / 58.73 ± 1.8 54.04 ± 2.61 / 74.89 ± 2.07 54.75 ± 1.08 / 65.66 ± 1.26
model-garden-lms/bert-base-token-dropping-finewebs-801k 67.80 88.95 ± 0.7 / 88.73 ± 0.58 57.71 ± 1.43 / 60.5 ± 1.69 50.95 ± 6.3 / 74.16 ± 3.2 55.24 ± 1.37 / 66.13 ± 1.24
google-bert/bert-base-cased 62.26 87.39 ± 0.79 / 87.11 ± 0.66 54.49 ± 1.36 / 53.22 ± 1.15 52.08 ± 2.13 / 74.52 ± 1.31 38.63 ± 2.1 / 50.68 ± 1.87
google/electra-base-discriminator 69.26 87.82 ± 0.69 / 86.83 ± 0.62 62.3 ± 1.12 / 55.93 ± 0.67 62.61 ± 1.21 / 80.85 ± 0.59 52.51 ± 0.86 / 65.2 ± 0.85
FacebookAI/roberta-base 68.96 90.35 ± 0.23 / 90.14 ± 0.2 60.95 ± 1.4 / 57.52 ± 1.97 50.64 ± 1.69 / 74.55 ± 0.9 57.82 ± 1.35 / 69.68 ± 1.02

Our pretrained Token Dropping BERT model shows only a strong performance over the original BERT model. All detailed results can be found in this dataset repository.

❤️ Acknowledgements

This repository is the outcome of the last two years of working with TPUs from the awesome TRC program and the TensorFlow Model Garden library.

Made from Bavarian Oberland with ❤️ and 🥨.

Downloads last month
11
Safetensors
Model size
136M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Datasets used to train model-garden-lms/bert-base-token-dropping-finewebs-801k