|
--- |
|
language: |
|
- en |
|
- fr |
|
- ro |
|
- de |
|
license: apache-2.0 |
|
library_name: transformers |
|
datasets: |
|
- c4 |
|
--- |
|
|
|
# Model Card for EncT5 |
|
|
|
EncT5 is a variant of T5 that utilizes mainly the encoder for non-autoregressive (ie. classification and regression) |
|
tasks. The model is from the paper [Fine-tuning T5 Encoder for Non-autoregressive Tasks](https://arxiv.org/abs/2110.08426) |
|
by Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li |
|
|
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
EncT5 uses the same base weights at T5, but **must be fine-tuning before use**. There are several special features |
|
to EncT5: |
|
|
|
1. There are less decoder layers (a single decoder layer by default), and so has fewer parameters/resources than the |
|
standard T5. |
|
3. There is a separate decoder word embedding, with the decoder input ids being predefined constants. During |
|
fine-tuning, the decoder embedding learns to use these constants as "prompts" to the encoder for the corresponding |
|
classification/regression tasks. |
|
5. There is a classification head on top of the decoder output. |
|
|
|
Research has shown that this model can be more efficient and usable over T5 and BERT for non-autoregressive |
|
tasks such as classification and regression. |
|
|
|
- **Developed by:** Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li. See the |
|
[associated paper](https://arxiv.org/abs/2110.08426). |
|
- **Model type:** Language Model |
|
- **Language(s) (NLP):** English, French, Romanian, German |
|
- **License:** Apache 2.0 |
|
- **Based on model:** [T5](https://huggingface.co/google-t5/t5-base) |
|
- **Repository:** [Github repro](https://github.com/hackyon/EncT5) |
|
- **Paper:** [Fine-tuning T5 Encoder for Non-autoregressive Tasks](https://arxiv.org/abs/2110.08426) |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
model = AutoModelForSequenceClassification.from_pretrained("hackyon/enct5-base", trust_remote_code=True) |
|
# Fine-tune the model before use. |
|
``` |
|
|
|
See the [github repro](https://github.com/hackyon/EncT5) for a more comprehensive guide. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The weights of this model are directly copied from [t5-base](https://huggingface.co/google-t5/t5-base). |
|
|
|
### Training Procedure |
|
|
|
This model **must be fine-tuned** before use. The decoder word embedding and classification head are both untrained. |
|
|
|
|