roberta_base_en / README.md
Divyasreepat's picture
Update README.md with new model card content
bbce384 verified
|
raw
history blame
4.19 kB
metadata
library_name: keras-hub

Model Overview

A RoBERTa encoder network.

This network implements a bi-directional Transformer-based encoder as described in "RoBERTa: A Robustly Optimized BERT Pretraining Approach". It includes the embedding lookups and transformer layers, but does not include the masked language model head used during pretraining.

The default constructor gives a fully customizable, randomly initialized RoBERTa encoder with any number of layers, heads, and embedding dimensions. To load preset architectures and weights, use the from_preset() constructor.

Disclaimer: Pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind. The underlying model is provided by a third party and subject to a separate license, available here.

Arguments

  • vocabulary_size: int. The size of the token vocabulary.
  • num_layers: int. The number of transformer layers.
  • num_heads: int. The number of attention heads for each transformer. The hidden size must be divisible by the number of attention heads.
  • hidden_dim: int. The size of the transformer encoding layer.
  • intermediate_dim: int. The output dimension of the first Dense layer in a two-layer feedforward network for each transformer.
  • dropout: float. Dropout probability for the Transformer encoder.
  • max_sequence_length: int. The maximum sequence length this encoder can consume. The sequence length of the input must be less than max_sequence_length default value. This determines the variable shape for positional embeddings.

Example Usage

import keras
import keras_hub
import numpy as np

Raw string data.

features = ["The quick brown fox jumped.", "I forgot my homework."]
labels = [0, 3]

# Pretrained classifier.
classifier = keras_hub.models.RobertaClassifier.from_preset(
    "roberta_base_en",
    num_classes=4,
)
classifier.fit(x=features, y=labels, batch_size=2)
classifier.predict(x=features, batch_size=2)

# Re-compile (e.g., with a new learning rate).
classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(5e-5),
    jit_compile=True,
)
# Access backbone programmatically (e.g., to change `trainable`).
classifier.backbone.trainable = False
# Fit again.
classifier.fit(x=features, y=labels, batch_size=2)

Preprocessed integer data.

features = {
    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
}
labels = [0, 3]

# Pretrained classifier without preprocessing.
classifier = keras_hub.models.RobertaClassifier.from_preset(
    "roberta_base_en",
    num_classes=4,
    preprocessor=None,
)
classifier.fit(x=features, y=labels, batch_size=2)

Example Usage with Hugging Face URI

import keras
import keras_hub
import numpy as np

Raw string data.

features = ["The quick brown fox jumped.", "I forgot my homework."]
labels = [0, 3]

# Pretrained classifier.
classifier = keras_hub.models.RobertaClassifier.from_preset(
    "hf://keras/roberta_base_en",
    num_classes=4,
)
classifier.fit(x=features, y=labels, batch_size=2)
classifier.predict(x=features, batch_size=2)

# Re-compile (e.g., with a new learning rate).
classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(5e-5),
    jit_compile=True,
)
# Access backbone programmatically (e.g., to change `trainable`).
classifier.backbone.trainable = False
# Fit again.
classifier.fit(x=features, y=labels, batch_size=2)

Preprocessed integer data.

features = {
    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
}
labels = [0, 3]

# Pretrained classifier without preprocessing.
classifier = keras_hub.models.RobertaClassifier.from_preset(
    "hf://keras/roberta_base_en",
    num_classes=4,
    preprocessor=None,
)
classifier.fit(x=features, y=labels, batch_size=2)