enct5-base / README.md
hackyon's picture
Upload EncT5ForSequenceClassification
7afe512 verified
---
language:
- en
- fr
- ro
- de
license: apache-2.0
library_name: transformers
datasets:
- c4
---
# Model Card for EncT5
EncT5 is a variant of T5 that utilizes mainly the encoder for non-autoregressive (ie. classification and regression)
tasks. The model is from the paper [Fine-tuning T5 Encoder for Non-autoregressive Tasks](https://arxiv.org/abs/2110.08426)
by Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li
## Model Details
### Model Description
EncT5 uses the same base weights at T5, but **must be fine-tuning before use**. There are several special features
to EncT5:
1. There are less decoder layers (a single decoder layer by default), and so has fewer parameters/resources than the
standard T5.
3. There is a separate decoder word embedding, with the decoder input ids being predefined constants. During
fine-tuning, the decoder embedding learns to use these constants as "prompts" to the encoder for the corresponding
classification/regression tasks.
5. There is a classification head on top of the decoder output.
Research has shown that this model can be more efficient and usable over T5 and BERT for non-autoregressive
tasks such as classification and regression.
- **Developed by:** Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li. See the
[associated paper](https://arxiv.org/abs/2110.08426).
- **Model type:** Language Model
- **Language(s) (NLP):** English, French, Romanian, German
- **License:** Apache 2.0
- **Based on model:** [T5](https://huggingface.co/google-t5/t5-base)
- **Repository:** [Github repro](https://github.com/hackyon/EncT5)
- **Paper:** [Fine-tuning T5 Encoder for Non-autoregressive Tasks](https://arxiv.org/abs/2110.08426)
## How to Get Started with the Model
Use the code below to get started with the model.
```python
model = AutoModelForSequenceClassification.from_pretrained("hackyon/enct5-base", trust_remote_code=True)
# Fine-tune the model before use.
```
See the [github repro](https://github.com/hackyon/EncT5) for a more comprehensive guide.
## Training Details
### Training Data
The weights of this model are directly copied from [t5-base](https://huggingface.co/google-t5/t5-base).
### Training Procedure
This model **must be fine-tuned** before use. The decoder word embedding and classification head are both untrained.