Model Card for EncT5
EncT5 is a variant of T5 that utilizes mainly the encoder for non-autoregressive (ie. classification and regression) tasks. The model is from the paper Fine-tuning T5 Encoder for Non-autoregressive Tasks by Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li
Model Details
Model Description
EncT5 uses the same base weights at T5, but must be fine-tuning before use. There are several special features to EncT5:
- There are less decoder layers (a single decoder layer by default), and so has fewer parameters/resources than the standard T5.
- There is a separate decoder word embedding, with the decoder input ids being predefined constants. During fine-tuning, the decoder embedding learns to use these constants as "prompts" to the encoder for the corresponding classification/regression tasks.
- There is a classification head on top of the decoder output.
Research has shown that this model can be more efficient and usable over T5 and BERT for non-autoregressive tasks such as classification and regression.
- Developed by: Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li. See the associated paper.
- Model type: Language Model
- Language(s) (NLP): English, French, Romanian, German
- License: Apache 2.0
- Based on model: T5
- Repository: Github repro
- Paper: Fine-tuning T5 Encoder for Non-autoregressive Tasks
How to Get Started with the Model
Use the code below to get started with the model.
model = AutoModelForSequenceClassification.from_pretrained("hackyon/enct5-base", trust_remote_code=True)
# Fine-tune the model before use.
See the github repro for a more comprehensive guide.
Training Details
Training Data
The weights of this model are directly copied from t5-base.
Training Procedure
This model must be fine-tuned before use. The decoder word embedding and classification head are both untrained.
- Downloads last month
- 9