|
--- |
|
library_name: keras-hub |
|
license: apache-2.0 |
|
language: |
|
- en |
|
tags: |
|
- text-classification |
|
- keras |
|
pipeline_tag: text-classification |
|
--- |
|
### Model Overview |
|
ALBERT encoder network. |
|
|
|
This class implements a bi-directional Transformer-based encoder as |
|
described in |
|
["ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"](https://arxiv.org/abs/1909.11942). |
|
ALBERT is a more efficient variant of BERT, and uses parameter reduction |
|
techniques such as cross-layer parameter sharing and factorized embedding |
|
parameterization. This model class includes the embedding lookups and |
|
transformer layers, but not the masked language model or sentence order |
|
prediction heads. |
|
|
|
The default constructor gives a fully customizable, randomly initialized |
|
ALBERT encoder with any number of layers, heads, and embedding dimensions. |
|
To load preset architectures and weights, use the `from_preset` constructor. |
|
|
|
Disclaimer: Pre-trained models are provided on an "as is" basis, without |
|
warranties or conditions of any kind. |
|
|
|
|
|
__Arguments__ |
|
|
|
|
|
- __vocabulary_size__: int. The size of the token vocabulary. |
|
- __num_layers__: int, must be divisible by `num_groups`. The number of |
|
"virtual" layers, i.e., the total number of times the input sequence |
|
will be fed through the groups in one forward pass. The input will |
|
be routed to the correct group based on the layer index. |
|
- __num_heads__: int. The number of attention heads for each transformer. |
|
The hidden size must be divisible by the number of attention heads. |
|
- __embedding_dim__: int. The size of the embeddings. |
|
- __hidden_dim__: int. The size of the transformer encoding and pooler layers. |
|
- __intermediate_dim__: int. The output dimension of the first Dense layer in |
|
a two-layer feedforward network for each transformer. |
|
- __num_groups__: int. Number of groups, with each group having |
|
`num_inner_repetitions` number of `TransformerEncoder` layers. |
|
- __num_inner_repetitions__: int. Number of `TransformerEncoder` layers per |
|
group. |
|
- __dropout__: float. Dropout probability for the Transformer encoder. |
|
- __max_sequence_length__: int. The maximum sequence length that this encoder |
|
can consume. If None, `max_sequence_length` uses the value from |
|
sequence length. This determines the variable shape for positional |
|
embeddings. |
|
- __num_segments__: int. The number of types that the 'segment_ids' input can |
|
take. |
|
|
|
## Example Usage |
|
```python |
|
import keras |
|
import keras_hub |
|
import numpy as np |
|
``` |
|
|
|
Raw string data. |
|
```python |
|
features = ["The quick brown fox jumped.", "I forgot my homework."] |
|
labels = [0, 3] |
|
|
|
# Pretrained classifier. |
|
classifier = keras_hub.models.AlbertClassifier.from_preset( |
|
"albert_extra_large_en_uncased", |
|
num_classes=4, |
|
) |
|
classifier.fit(x=features, y=labels, batch_size=2) |
|
classifier.predict(x=features, batch_size=2) |
|
|
|
# Re-compile (e.g., with a new learning rate). |
|
classifier.compile( |
|
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), |
|
optimizer=keras.optimizers.Adam(5e-5), |
|
jit_compile=True, |
|
) |
|
# Access backbone programmatically (e.g., to change `trainable`). |
|
classifier.backbone.trainable = False |
|
# Fit again. |
|
classifier.fit(x=features, y=labels, batch_size=2) |
|
``` |
|
|
|
Preprocessed integer data. |
|
```python |
|
features = { |
|
"token_ids": np.ones(shape=(2, 12), dtype="int32"), |
|
"segment_ids": np.array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]] * 2), |
|
"padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2), |
|
} |
|
labels = [0, 3] |
|
|
|
# Pretrained classifier without preprocessing. |
|
classifier = keras_hub.models.AlbertClassifier.from_preset( |
|
"albert_extra_large_en_uncased", |
|
num_classes=4, |
|
preprocessor=None, |
|
) |
|
classifier.fit(x=features, y=labels, batch_size=2) |
|
``` |
|
|
|
## Example Usage with Hugging Face URI |
|
|
|
```python |
|
import keras |
|
import keras_hub |
|
import numpy as np |
|
``` |
|
|
|
Raw string data. |
|
```python |
|
features = ["The quick brown fox jumped.", "I forgot my homework."] |
|
labels = [0, 3] |
|
|
|
# Pretrained classifier. |
|
classifier = keras_hub.models.AlbertClassifier.from_preset( |
|
"hf://keras/albert_extra_large_en_uncased", |
|
num_classes=4, |
|
) |
|
classifier.fit(x=features, y=labels, batch_size=2) |
|
classifier.predict(x=features, batch_size=2) |
|
|
|
# Re-compile (e.g., with a new learning rate). |
|
classifier.compile( |
|
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), |
|
optimizer=keras.optimizers.Adam(5e-5), |
|
jit_compile=True, |
|
) |
|
# Access backbone programmatically (e.g., to change `trainable`). |
|
classifier.backbone.trainable = False |
|
# Fit again. |
|
classifier.fit(x=features, y=labels, batch_size=2) |
|
``` |
|
|
|
Preprocessed integer data. |
|
```python |
|
features = { |
|
"token_ids": np.ones(shape=(2, 12), dtype="int32"), |
|
"segment_ids": np.array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]] * 2), |
|
"padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2), |
|
} |
|
labels = [0, 3] |
|
|
|
# Pretrained classifier without preprocessing. |
|
classifier = keras_hub.models.AlbertClassifier.from_preset( |
|
"hf://keras/albert_extra_large_en_uncased", |
|
num_classes=4, |
|
preprocessor=None, |
|
) |
|
classifier.fit(x=features, y=labels, batch_size=2) |
|
``` |
|
|