Divyasreepat commited on
Commit
4735bec
1 Parent(s): cfa7ba2

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +83 -16
README.md CHANGED
@@ -1,16 +1,83 @@
1
- ---
2
- library_name: keras-hub
3
- ---
4
- This is a [`Roberta` model](https://keras.io/api/keras_hub/models/roberta) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
5
- Model config:
6
- * **name:** roberta_backbone
7
- * **trainable:** True
8
- * **vocabulary_size:** 50265
9
- * **num_layers:** 12
10
- * **num_heads:** 12
11
- * **hidden_dim:** 768
12
- * **intermediate_dim:** 3072
13
- * **dropout:** 0.1
14
- * **max_sequence_length:** 512
15
-
16
- This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### Model Overview
2
+ A RoBERTa encoder network.
3
+
4
+ This network implements a bi-directional Transformer-based encoder as
5
+ described in ["RoBERTa: A Robustly Optimized BERT Pretraining Approach"](https://arxiv.org/abs/1907.11692).
6
+ It includes the embedding lookups and transformer layers, but does not
7
+ include the masked language model head used during pretraining.
8
+
9
+ The default constructor gives a fully customizable, randomly initialized
10
+ RoBERTa encoder with any number of layers, heads, and embedding
11
+ dimensions. To load preset architectures and weights, use the `from_preset()`
12
+ constructor.
13
+
14
+ Disclaimer: Pre-trained models are provided on an "as is" basis, without
15
+ warranties or conditions of any kind. The underlying model is provided by a
16
+ third party and subject to a separate license, available
17
+ [here](https://github.com/facebookresearch/fairseq).
18
+
19
+
20
+ __Arguments__
21
+
22
+
23
+ - __vocabulary_size__: int. The size of the token vocabulary.
24
+ - __num_layers__: int. The number of transformer layers.
25
+ - __num_heads__: int. The number of attention heads for each transformer.
26
+ The hidden size must be divisible by the number of attention heads.
27
+ - __hidden_dim__: int. The size of the transformer encoding layer.
28
+ - __intermediate_dim__: int. The output dimension of the first Dense layer in
29
+ a two-layer feedforward network for each transformer.
30
+ - __dropout__: float. Dropout probability for the Transformer encoder.
31
+ - __max_sequence_length__: int. The maximum sequence length this encoder can
32
+ consume. The sequence length of the input must be less than
33
+ `max_sequence_length` default value. This determines the variable
34
+ shape for positional embeddings.
35
+
36
+ ### Example Usage
37
+ ```python
38
+ import keras
39
+ import keras_nlp
40
+ import numpy as np
41
+ ```
42
+
43
+ Raw string data.
44
+ ```python
45
+ features = ["The quick brown fox jumped.", "I forgot my homework."]
46
+ labels = [0, 3]
47
+
48
+ # Pretrained classifier.
49
+ classifier = keras_nlp.models.RobertaClassifier.from_preset(
50
+ "${VARIATION_SLUG}",
51
+ num_classes=4,
52
+ )
53
+ classifier.fit(x=features, y=labels, batch_size=2)
54
+ classifier.predict(x=features, batch_size=2)
55
+
56
+ # Re-compile (e.g., with a new learning rate).
57
+ classifier.compile(
58
+ loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
59
+ optimizer=keras.optimizers.Adam(5e-5),
60
+ jit_compile=True,
61
+ )
62
+ # Access backbone programmatically (e.g., to change `trainable`).
63
+ classifier.backbone.trainable = False
64
+ # Fit again.
65
+ classifier.fit(x=features, y=labels, batch_size=2)
66
+ ```
67
+
68
+ Preprocessed integer data.
69
+ ```python
70
+ features = {
71
+ "token_ids": np.ones(shape=(2, 12), dtype="int32"),
72
+ "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
73
+ }
74
+ labels = [0, 3]
75
+
76
+ # Pretrained classifier without preprocessing.
77
+ classifier = keras_nlp.models.RobertaClassifier.from_preset(
78
+ "${VARIATION_SLUG}",
79
+ num_classes=4,
80
+ preprocessor=None,
81
+ )
82
+ classifier.fit(x=features, y=labels, batch_size=2)
83
+ ```