README.md · keras/roberta_base_en at 93c0217ae5fddc7de2af17bb779de93dcb8ce7f0

Model Overview

A RoBERTa encoder network.

This network implements a bi-directional Transformer-based encoder as described in "RoBERTa: A Robustly Optimized BERT Pretraining Approach". It includes the embedding lookups and transformer layers, but does not include the masked language model head used during pretraining.

The default constructor gives a fully customizable, randomly initialized RoBERTa encoder with any number of layers, heads, and embedding dimensions. To load preset architectures and weights, use the from_preset() constructor.

Disclaimer: Pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind. The underlying model is provided by a third party and subject to a separate license, available here.

Arguments

vocabulary_size: int. The size of the token vocabulary.
num_layers: int. The number of transformer layers.
num_heads: int. The number of attention heads for each transformer. The hidden size must be divisible by the number of attention heads.
hidden_dim: int. The size of the transformer encoding layer.
intermediate_dim: int. The output dimension of the first Dense layer in a two-layer feedforward network for each transformer.
dropout: float. Dropout probability for the Transformer encoder.
max_sequence_length: int. The maximum sequence length this encoder can consume. The sequence length of the input must be less than max_sequence_length default value. This determines the variable shape for positional embeddings.

Example Usage

import keras
import keras_nlp
import numpy as np

Raw string data.

features = ["The quick brown fox jumped.", "I forgot my homework."]
labels = [0, 3]

# Pretrained classifier.
classifier = keras_nlp.models.RobertaClassifier.from_preset(
    "${VARIATION_SLUG}",
    num_classes=4,
)
classifier.fit(x=features, y=labels, batch_size=2)
classifier.predict(x=features, batch_size=2)

# Re-compile (e.g., with a new learning rate).
classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(5e-5),
    jit_compile=True,
)
# Access backbone programmatically (e.g., to change `trainable`).
classifier.backbone.trainable = False
# Fit again.
classifier.fit(x=features, y=labels, batch_size=2)

Preprocessed integer data.

features = {
    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
}
labels = [0, 3]

# Pretrained classifier without preprocessing.
classifier = keras_nlp.models.RobertaClassifier.from_preset(
    "${VARIATION_SLUG}",
    num_classes=4,
    preprocessor=None,
)
classifier.fit(x=features, y=labels, batch_size=2)

Example Usage with HuggingFace uri

import keras
import keras_nlp
import numpy as np

Raw string data.

features = ["The quick brown fox jumped.", "I forgot my homework."]
labels = [0, 3]

# Pretrained classifier.
classifier = keras_nlp.models.RobertaClassifier.from_preset(
    "${VARIATION_SLUG}",
    num_classes=4,
)
classifier.fit(x=features, y=labels, batch_size=2)
classifier.predict(x=features, batch_size=2)

# Re-compile (e.g., with a new learning rate).
classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(5e-5),
    jit_compile=True,
)
# Access backbone programmatically (e.g., to change `trainable`).
classifier.backbone.trainable = False
# Fit again.
classifier.fit(x=features, y=labels, batch_size=2)

Preprocessed integer data.

features = {
    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
}
labels = [0, 3]

# Pretrained classifier without preprocessing.
classifier = keras_nlp.models.RobertaClassifier.from_preset(
    "${VARIATION_SLUG}",
    num_classes=4,
    preprocessor=None,
)
classifier.fit(x=features, y=labels, batch_size=2)