metadata
language: en
thumbnail: https://huggingface.co/front/thumbnails/google.png
license: apache-2.0
base_model:
- google/bert_uncased_L-2_H-128_A-2
pipeline_tag: text-classification
library_name: transformers
metrics:
- f1
- precision
- recall
datasets:
- Mozilla/autofill_dataset
BERT Miniatures
This is the tiny version of the 24 BERT models referenced in Well-Read Students Learn Better: On the Importance of Pre-training Compact Models (English only, uncased, trained with WordPiece masking).
This checkpoint is the original TinyBert Optimized Uncased English: TinyBert checkpoint.
This model was fine-tuned on html tags and labels using Fathom.
How to use TinyBert in transformers
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Mozilla/tinybert-uncased-autofill"
)
print(
classifier('<input class="cc-number" placeholder="Enter credit card number..." />')
)
Model Training Info
HyperParameters: {
'learning_rate': 0.000082,
'num_train_epochs': 59,
'weight_decay': 0.1,
'per_device_train_batch_size': 32,
}
More information on how the model was trained can be found here: https://github.com/mozilla/smart_autofill
Model Performance
Test Performance:
Precision: 0.96778
Recall: 0.96696
F1: 0.9668
precision recall f1-score support
CC Expiration 1.000 0.750 0.857 16
CC Expiration Month 0.972 0.972 0.972 36
CC Expiration Year 0.946 0.946 0.946 37
CC Name 0.882 0.968 0.923 31
CC Number 0.942 0.980 0.961 50
CC Payment Type 0.918 0.893 0.905 75
CC Security Code 0.950 0.927 0.938 41
CC Type 0.917 0.786 0.846 14
Confirm Password 0.961 0.860 0.907 57
Email 0.909 0.959 0.933 73
First Name 0.800 0.800 0.800 5
Form 0.974 0.974 0.974 39
Last Name 0.714 1.000 0.833 5
New Password 0.913 0.979 0.945 97
Other 0.986 0.983 0.985 1235
Phone 1.000 0.667 0.800 3
Zip Code 0.912 0.969 0.939 32
accuracy 0.967 1846
macro avg 0.923 0.907 0.910 1846
weighted avg 0.968 0.967 0.967 1846
@article{turc2019,
title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models},
author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
journal={arXiv preprint arXiv:1908.08962v2 },
year={2019}
}