roberta_base_en / README.md

Update README.md with new model card content

bbce384 verified 16 days ago

4.19 kB

	---
	library_name: keras-hub
	---
	## Model Overview
	A RoBERTa encoder network.

	This network implements a bi-directional Transformer-based encoder as
	described in ["RoBERTa: A Robustly Optimized BERT Pretraining Approach"](https://arxiv.org/abs/1907.11692).
	It includes the embedding lookups and transformer layers, but does not
	include the masked language model head used during pretraining.

	The default constructor gives a fully customizable, randomly initialized
	RoBERTa encoder with any number of layers, heads, and embedding
	dimensions. To load preset architectures and weights, use the `from_preset()`
	constructor.

	Disclaimer: Pre-trained models are provided on an "as is" basis, without
	warranties or conditions of any kind. The underlying model is provided by a
	third party and subject to a separate license, available
	[here](https://github.com/facebookresearch/fairseq).


	__Arguments__


	- __vocabulary_size__: int. The size of the token vocabulary.
	- __num_layers__: int. The number of transformer layers.
	- __num_heads__: int. The number of attention heads for each transformer.
	The hidden size must be divisible by the number of attention heads.
	- __hidden_dim__: int. The size of the transformer encoding layer.
	- __intermediate_dim__: int. The output dimension of the first Dense layer in
	a two-layer feedforward network for each transformer.
	- __dropout__: float. Dropout probability for the Transformer encoder.
	- __max_sequence_length__: int. The maximum sequence length this encoder can
	consume. The sequence length of the input must be less than
	`max_sequence_length` default value. This determines the variable
	shape for positional embeddings.

	## Example Usage
	```python
	import keras
	import keras_hub
	import numpy as np
	```

	Raw string data.
	```python
	features = ["The quick brown fox jumped.", "I forgot my homework."]
	labels = [0, 3]

	# Pretrained classifier.
	classifier = keras_hub.models.RobertaClassifier.from_preset(
	"roberta_base_en",
	num_classes=4,
	)
	classifier.fit(x=features, y=labels, batch_size=2)
	classifier.predict(x=features, batch_size=2)

	# Re-compile (e.g., with a new learning rate).
	classifier.compile(
	loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
	optimizer=keras.optimizers.Adam(5e-5),
	jit_compile=True,
	)
	# Access backbone programmatically (e.g., to change `trainable`).
	classifier.backbone.trainable = False
	# Fit again.
	classifier.fit(x=features, y=labels, batch_size=2)
	```

	Preprocessed integer data.
	```python
	features = {
	"token_ids": np.ones(shape=(2, 12), dtype="int32"),
	"padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
	}
	labels = [0, 3]

	# Pretrained classifier without preprocessing.
	classifier = keras_hub.models.RobertaClassifier.from_preset(
	"roberta_base_en",
	num_classes=4,
	preprocessor=None,
	)
	classifier.fit(x=features, y=labels, batch_size=2)
	```

	## Example Usage with Hugging Face URI

	```python
	import keras
	import keras_hub
	import numpy as np
	```

	Raw string data.
	```python
	features = ["The quick brown fox jumped.", "I forgot my homework."]
	labels = [0, 3]

	# Pretrained classifier.
	classifier = keras_hub.models.RobertaClassifier.from_preset(
	"hf://keras/roberta_base_en",
	num_classes=4,
	)
	classifier.fit(x=features, y=labels, batch_size=2)
	classifier.predict(x=features, batch_size=2)

	# Re-compile (e.g., with a new learning rate).
	classifier.compile(
	loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
	optimizer=keras.optimizers.Adam(5e-5),
	jit_compile=True,
	)
	# Access backbone programmatically (e.g., to change `trainable`).
	classifier.backbone.trainable = False
	# Fit again.
	classifier.fit(x=features, y=labels, batch_size=2)
	```

	Preprocessed integer data.
	```python
	features = {
	"token_ids": np.ones(shape=(2, 12), dtype="int32"),
	"padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
	}
	labels = [0, 3]

	# Pretrained classifier without preprocessing.
	classifier = keras_hub.models.RobertaClassifier.from_preset(
	"hf://keras/roberta_base_en",
	num_classes=4,
	preprocessor=None,
	)
	classifier.fit(x=features, y=labels, batch_size=2)
	```