keras
/

albert_extra_large_en_uncased

Text Classification

KerasHub

Keras

English

Model card Files Files and versions Community

Divyasreepat commited on 19 days ago

Commit

86cb024

•

1 Parent(s): ad0f89b

Update README.md with new model card content

Browse files

Files changed (1) hide show

README.md +146 -17

README.md CHANGED Viewed

@@ -1,20 +1,149 @@
 ---
 library_name: keras-hub
 ---
-This is a [`Albert` model](https://keras.io/api/keras_hub/models/albert) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
-Model config:
-* **name:** albert_backbone
-* **trainable:** True
-* **vocabulary_size:** 30000
-* **num_layers:** 24
-* **num_heads:** 16
-* **num_groups:** 1
-* **num_inner_repetitions:** 1
-* **embedding_dim:** 128
-* **hidden_dim:** 2048
-* **intermediate_dim:** 8192
-* **dropout:** 0
-* **max_sequence_length:** 512
-* **num_segments:** 2
-This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.

 ---
 library_name: keras-hub
 ---
+### Model Overview
+ALBERT encoder network.
+This class implements a bi-directional Transformer-based encoder as
+described in
+["ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"](https://arxiv.org/abs/1909.11942).
+ALBERT is a more efficient variant of BERT, and uses parameter reduction
+techniques such as cross-layer parameter sharing and factorized embedding
+parameterization. This model class includes the embedding lookups and
+transformer layers, but not the masked language model or sentence order
+prediction heads.
+The default constructor gives a fully customizable, randomly initialized
+ALBERT encoder with any number of layers, heads, and embedding dimensions.
+To load preset architectures and weights, use the `from_preset` constructor.
+Disclaimer: Pre-trained models are provided on an "as is" basis, without
+warranties or conditions of any kind.
+__Arguments__
+- __vocabulary_size__: int. The size of the token vocabulary.
+- __num_layers__: int, must be divisible by `num_groups`. The number of
+    "virtual" layers, i.e., the total number of times the input sequence
+    will be fed through the groups in one forward pass. The input will
+    be routed to the correct group based on the layer index.
+- __num_heads__: int. The number of attention heads for each transformer.
+    The hidden size must be divisible by the number of attention heads.
+- __embedding_dim__: int. The size of the embeddings.
+- __hidden_dim__: int. The size of the transformer encoding and pooler layers.
+- __intermediate_dim__: int. The output dimension of the first Dense layer in
+    a two-layer feedforward network for each transformer.
+- __num_groups__: int. Number of groups, with each group having
+    `num_inner_repetitions` number of `TransformerEncoder` layers.
+- __num_inner_repetitions__: int. Number of `TransformerEncoder` layers per
+    group.
+- __dropout__: float. Dropout probability for the Transformer encoder.
+- __max_sequence_length__: int. The maximum sequence length that this encoder
+    can consume. If None, `max_sequence_length` uses the value from
+    sequence length. This determines the variable shape for positional
+    embeddings.
+- __num_segments__: int. The number of types that the 'segment_ids' input can
+    take.
+### Example Usage
+```python
+import keras
+import keras_hub
+import numpy as np
+```
+Raw string data.
+```python
+features = ["The quick brown fox jumped.", "I forgot my homework."]
+labels = [0, 3]
+# Pretrained classifier.
+classifier = keras_hub.models.AlbertClassifier.from_preset(
+    "albert_extra_large_en_uncased",
+    num_classes=4,
+)
+classifier.fit(x=features, y=labels, batch_size=2)
+classifier.predict(x=features, batch_size=2)
+# Re-compile (e.g., with a new learning rate).
+classifier.compile(
+    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+    optimizer=keras.optimizers.Adam(5e-5),
+    jit_compile=True,
+)
+# Access backbone programmatically (e.g., to change `trainable`).
+classifier.backbone.trainable = False
+# Fit again.
+classifier.fit(x=features, y=labels, batch_size=2)
+```
+Preprocessed integer data.
+```python
+features = {
+    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
+    "segment_ids": np.array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]] * 2),
+    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
+}
+labels = [0, 3]
+# Pretrained classifier without preprocessing.
+classifier = keras_hub.models.AlbertClassifier.from_preset(
+    "albert_extra_large_en_uncased",
+    num_classes=4,
+    preprocessor=None,
+)
+classifier.fit(x=features, y=labels, batch_size=2)
+```
+## Example Usage with Hugging Face URI
+```python
+import keras
+import keras_hub
+import numpy as np
+```
+Raw string data.
+```python
+features = ["The quick brown fox jumped.", "I forgot my homework."]
+labels = [0, 3]
+# Pretrained classifier.
+classifier = keras_hub.models.AlbertClassifier.from_preset(
+    "hf://keras/albert_extra_large_en_uncased",
+    num_classes=4,
+)
+classifier.fit(x=features, y=labels, batch_size=2)
+classifier.predict(x=features, batch_size=2)
+# Re-compile (e.g., with a new learning rate).
+classifier.compile(
+    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+    optimizer=keras.optimizers.Adam(5e-5),
+    jit_compile=True,
+)
+# Access backbone programmatically (e.g., to change `trainable`).
+classifier.backbone.trainable = False
+# Fit again.
+classifier.fit(x=features, y=labels, batch_size=2)
+```
+Preprocessed integer data.
+```python
+features = {
+    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
+    "segment_ids": np.array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]] * 2),
+    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
+}
+labels = [0, 3]
+# Pretrained classifier without preprocessing.
+classifier = keras_hub.models.AlbertClassifier.from_preset(
+    "hf://keras/albert_extra_large_en_uncased",
+    num_classes=4,
+    preprocessor=None,
+)
+classifier.fit(x=features, y=labels, batch_size=2)
+```