README.md · keras/t5_1.1

metadata

library_name: keras-hub
license: apache-2.0
tags:
  - text-classification
  - keras
pipeline_tag: text-generation

Model Overview

⚠️ T5 is currently only available via the keras-hub-nightly package. Use pip install keras-hub-nightly to try this model.

T5 encoder-decoder backbone model.

T5 is a LLM pretrained on a mix of unsupervised and supervised tasks, where each task is converted to a sequence-to-sequence format. T5 works well on a variety of tasks out-of-the-box by prepending various prefixex to the input sequence, e.g., for translation: "translate English to German: ...", for summarization: "summarize: ...".

T5 was introduced in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

The default constructor gives a fully customizable, randomly initialized T5 model with any number of layers, heads, and embedding dimensions. To load preset architectures and weights, use the from_preset constructor.

Disclaimer: Pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind.

Arguments

vocabulary_size: int. The size of the token vocabulary.
num_layers: int. The number of Transformer layers.
num_heads: int. The number of attention heads for each Transformer. The hidden size must be divisible by the number of attention heads.
hidden_dim: int. The hidden size of the Transformer layers.
intermediate_dim: int. The output dimension of the first Dense layer in a two-layer feedforward network for each Transformer layer.
key_value_dim: int. The dimension of each head of the key/value projections in the multi-head attention layers. Defaults to hidden_dim / num_heads.
dropout: float. Dropout probability for the Transformer layers.
activation: activation function (or activation string name). The activation to be used in the inner dense blocks of the Transformer layers. Defaults to "relu".
use_gated_activation: boolean. Whether to use activation gating in the inner dense blocks of the Transformer layers. The original T5 architecture didn't use gating, but more recent versions do. Defaults to True.
layer_norm_epsilon: float. Epsilon factor to be used in the layer normalization layers in the Transformer layers.
tie_embedding_weights: boolean. If True, the weights of the token embedding and the weights projecting language model outputs from hidden_dim