Divyasreepat commited on
Commit
86cb024
1 Parent(s): ad0f89b

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +146 -17
README.md CHANGED
@@ -1,20 +1,149 @@
1
  ---
2
  library_name: keras-hub
3
  ---
4
- This is a [`Albert` model](https://keras.io/api/keras_hub/models/albert) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
5
- Model config:
6
- * **name:** albert_backbone
7
- * **trainable:** True
8
- * **vocabulary_size:** 30000
9
- * **num_layers:** 24
10
- * **num_heads:** 16
11
- * **num_groups:** 1
12
- * **num_inner_repetitions:** 1
13
- * **embedding_dim:** 128
14
- * **hidden_dim:** 2048
15
- * **intermediate_dim:** 8192
16
- * **dropout:** 0
17
- * **max_sequence_length:** 512
18
- * **num_segments:** 2
19
-
20
- This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: keras-hub
3
  ---
4
+ ### Model Overview
5
+ ALBERT encoder network.
6
+
7
+ This class implements a bi-directional Transformer-based encoder as
8
+ described in
9
+ ["ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"](https://arxiv.org/abs/1909.11942).
10
+ ALBERT is a more efficient variant of BERT, and uses parameter reduction
11
+ techniques such as cross-layer parameter sharing and factorized embedding
12
+ parameterization. This model class includes the embedding lookups and
13
+ transformer layers, but not the masked language model or sentence order
14
+ prediction heads.
15
+
16
+ The default constructor gives a fully customizable, randomly initialized
17
+ ALBERT encoder with any number of layers, heads, and embedding dimensions.
18
+ To load preset architectures and weights, use the `from_preset` constructor.
19
+
20
+ Disclaimer: Pre-trained models are provided on an "as is" basis, without
21
+ warranties or conditions of any kind.
22
+
23
+
24
+ __Arguments__
25
+
26
+
27
+ - __vocabulary_size__: int. The size of the token vocabulary.
28
+ - __num_layers__: int, must be divisible by `num_groups`. The number of
29
+ "virtual" layers, i.e., the total number of times the input sequence
30
+ will be fed through the groups in one forward pass. The input will
31
+ be routed to the correct group based on the layer index.
32
+ - __num_heads__: int. The number of attention heads for each transformer.
33
+ The hidden size must be divisible by the number of attention heads.
34
+ - __embedding_dim__: int. The size of the embeddings.
35
+ - __hidden_dim__: int. The size of the transformer encoding and pooler layers.
36
+ - __intermediate_dim__: int. The output dimension of the first Dense layer in
37
+ a two-layer feedforward network for each transformer.
38
+ - __num_groups__: int. Number of groups, with each group having
39
+ `num_inner_repetitions` number of `TransformerEncoder` layers.
40
+ - __num_inner_repetitions__: int. Number of `TransformerEncoder` layers per
41
+ group.
42
+ - __dropout__: float. Dropout probability for the Transformer encoder.
43
+ - __max_sequence_length__: int. The maximum sequence length that this encoder
44
+ can consume. If None, `max_sequence_length` uses the value from
45
+ sequence length. This determines the variable shape for positional
46
+ embeddings.
47
+ - __num_segments__: int. The number of types that the 'segment_ids' input can
48
+ take.
49
+
50
+ ### Example Usage
51
+ ```python
52
+ import keras
53
+ import keras_hub
54
+ import numpy as np
55
+ ```
56
+
57
+ Raw string data.
58
+ ```python
59
+ features = ["The quick brown fox jumped.", "I forgot my homework."]
60
+ labels = [0, 3]
61
+
62
+ # Pretrained classifier.
63
+ classifier = keras_hub.models.AlbertClassifier.from_preset(
64
+ "albert_extra_large_en_uncased",
65
+ num_classes=4,
66
+ )
67
+ classifier.fit(x=features, y=labels, batch_size=2)
68
+ classifier.predict(x=features, batch_size=2)
69
+
70
+ # Re-compile (e.g., with a new learning rate).
71
+ classifier.compile(
72
+ loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
73
+ optimizer=keras.optimizers.Adam(5e-5),
74
+ jit_compile=True,
75
+ )
76
+ # Access backbone programmatically (e.g., to change `trainable`).
77
+ classifier.backbone.trainable = False
78
+ # Fit again.
79
+ classifier.fit(x=features, y=labels, batch_size=2)
80
+ ```
81
+
82
+ Preprocessed integer data.
83
+ ```python
84
+ features = {
85
+ "token_ids": np.ones(shape=(2, 12), dtype="int32"),
86
+ "segment_ids": np.array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]] * 2),
87
+ "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
88
+ }
89
+ labels = [0, 3]
90
+
91
+ # Pretrained classifier without preprocessing.
92
+ classifier = keras_hub.models.AlbertClassifier.from_preset(
93
+ "albert_extra_large_en_uncased",
94
+ num_classes=4,
95
+ preprocessor=None,
96
+ )
97
+ classifier.fit(x=features, y=labels, batch_size=2)
98
+ ```
99
+
100
+ ## Example Usage with Hugging Face URI
101
+
102
+ ```python
103
+ import keras
104
+ import keras_hub
105
+ import numpy as np
106
+ ```
107
+
108
+ Raw string data.
109
+ ```python
110
+ features = ["The quick brown fox jumped.", "I forgot my homework."]
111
+ labels = [0, 3]
112
+
113
+ # Pretrained classifier.
114
+ classifier = keras_hub.models.AlbertClassifier.from_preset(
115
+ "hf://keras/albert_extra_large_en_uncased",
116
+ num_classes=4,
117
+ )
118
+ classifier.fit(x=features, y=labels, batch_size=2)
119
+ classifier.predict(x=features, batch_size=2)
120
+
121
+ # Re-compile (e.g., with a new learning rate).
122
+ classifier.compile(
123
+ loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
124
+ optimizer=keras.optimizers.Adam(5e-5),
125
+ jit_compile=True,
126
+ )
127
+ # Access backbone programmatically (e.g., to change `trainable`).
128
+ classifier.backbone.trainable = False
129
+ # Fit again.
130
+ classifier.fit(x=features, y=labels, batch_size=2)
131
+ ```
132
+
133
+ Preprocessed integer data.
134
+ ```python
135
+ features = {
136
+ "token_ids": np.ones(shape=(2, 12), dtype="int32"),
137
+ "segment_ids": np.array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0]] * 2),
138
+ "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
139
+ }
140
+ labels = [0, 3]
141
+
142
+ # Pretrained classifier without preprocessing.
143
+ classifier = keras_hub.models.AlbertClassifier.from_preset(
144
+ "hf://keras/albert_extra_large_en_uncased",
145
+ num_classes=4,
146
+ preprocessor=None,
147
+ )
148
+ classifier.fit(x=features, y=labels, batch_size=2)
149
+ ```