metadata
language: en
thumbnail: https://huggingface.co/front/thumbnails/microsoft.png
tags:
- text-classification
license: mit
AutoDisProxyT-RTE for Distilling Massive Neural Networks
AutoDisProxyT is a distilled task-agnostic transformer model that leverages task transfer for learning a small universal model that can be applied to arbitrary tasks and languages as outlined in the paper Few-shot Task-agnostic Neural Architecture Search for
Distilling Large Language Models.
This AutoDisProxyT checkpoint with 7 layers, 160 hidden size, 10 attention heads corresponds to 6.88 million parameters and 0.27G FLOPs.
The following table shows the results on GLUE dev set.
Models |
#Params (M) |
#FLOPs (G) |
MNLI |
QNLI |
QQP |
RTE |
SST-2 |
MRPC |
CoLA |
Avg |
BERT |
109 |
11.2 |
84.5 |
91.7 |
91.3 |
68.6 |
93.2 |
87.3 |
53.5 |
82.2 |
BERTSMALL |
66 |
5.66 |
81.8 |
89.8 |
90.6 |
67.9 |
91.2 |
84.9 |
53.5 |
80.0 |
TruncatedBERT |
66 |
5.66 |
81.2 |
87.9 |
90.4 |
65.5 |
90.8 |
82.7 |
41.4 |
77.1 |
DistilBERT |
66 |
5.66 |
82.2 |
89.2 |
88.5 |
59.9 |
91.3 |
87.5 |
51.3 |
78.6 |
TinyBERT |
66 |
5.66 |
83.5 |
90.5 |
90.6 |
72.2 |
91.6 |
88.4 |
42.8 |
79.9 |
MiniLM |
66 |
5.66 |
84.0 |
91.0 |
91.0 |
71.5 |
92.0 |
88.4 |
49.2 |
81.0 |
AutoTinyBERT-KD-S1 |
30.0 |
1.69 |
82.3 |
89.7 |
89.9 |
71.1 |
91.4 |
88.5 |
47.3 |
80.0 |
DynaBERT |
37.7 |
1.81 |
82.3 |
88.5 |
90.4 |
63.2 |
92.0 |
81.4 |
76.4 |
43.7 |
NAS-BERT10 |
10.0 |
2.30 |
76.4 |
86.3 |
88.5 |
66.6 |
88.6 |
79.1 |
34.0 |
74.2 |
AutoTinyBERT-KD-S4 |
66 |
5.66 |
76.0 |
85.5 |
86.9 |
64.9 |
86.8 |
81.4 |
20.4 |
71.7 |
NAS-BERT5 |
66 |
5.66 |
74.4 |
84.9 |
85.8 |
66.6 |
87.3 |
79.6 |
19.8 |
71.2 |
AutoDisProxyT |
6.88 |
0.27 |
79.0 |
86.4 |
89.1 |
64.3 |
85.9 |
78.5 |
24.8 |
72.6 |
Tested with torch 1.6.0
If you use this checkpoint in your work, please cite:
@article{xu2022autodistil,
title={AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models},
author={Xu, Dongkuan and Mukherjee, Subhabrata and Liu, Xiaodong and Dey, Debadeepta and Wang, Wenhui and Zhang, Xiang and Awadallah, Ahmed Hassan and Gao, Jianfeng},
journal={arXiv preprint arXiv:2201.12507},
year={2022}
}