File size: 4,188 Bytes
2369f6d
 
 
4735bec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29a177a
4735bec
 
 
 
 
 
 
 
 
29a177a
2369f6d
4735bec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29a177a
2369f6d
4735bec
 
 
 
93c0217
 
6241bc8
93c0217
 
 
29a177a
93c0217
 
 
 
 
 
 
 
 
29a177a
6241bc8
93c0217
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29a177a
6241bc8
93c0217
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
library_name: keras-hub
---
### Model Overview
A RoBERTa encoder network.

This network implements a bi-directional Transformer-based encoder as
described in ["RoBERTa: A Robustly Optimized BERT Pretraining Approach"](https://arxiv.org/abs/1907.11692).
It includes the embedding lookups and transformer layers, but does not
include the masked language model head used during pretraining.

The default constructor gives a fully customizable, randomly initialized
RoBERTa encoder with any number of layers, heads, and embedding
dimensions. To load preset architectures and weights, use the `from_preset()`
constructor.

Disclaimer: Pre-trained models are provided on an "as is" basis, without
warranties or conditions of any kind. The underlying model is provided by a
third party and subject to a separate license, available
[here](https://github.com/facebookresearch/fairseq).


__Arguments__


- __vocabulary_size__: int. The size of the token vocabulary.
- __num_layers__: int. The number of transformer layers.
- __num_heads__: int. The number of attention heads for each transformer.
    The hidden size must be divisible by the number of attention heads.
- __hidden_dim__: int. The size of the transformer encoding layer.
- __intermediate_dim__: int. The output dimension of the first Dense layer in
    a two-layer feedforward network for each transformer.
- __dropout__: float. Dropout probability for the Transformer encoder.
- __max_sequence_length__: int. The maximum sequence length this encoder can
    consume. The sequence length of the input must be less than
    `max_sequence_length` default value. This determines the variable
    shape for positional embeddings.

### Example Usage
```python
import keras
import keras_hub
import numpy as np
```

Raw string data.
```python
features = ["The quick brown fox jumped.", "I forgot my homework."]
labels = [0, 3]

# Pretrained classifier.
classifier = keras_hub.models.RobertaClassifier.from_preset(
    "roberta_base_en",
    num_classes=4,
)
classifier.fit(x=features, y=labels, batch_size=2)
classifier.predict(x=features, batch_size=2)

# Re-compile (e.g., with a new learning rate).
classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(5e-5),
    jit_compile=True,
)
# Access backbone programmatically (e.g., to change `trainable`).
classifier.backbone.trainable = False
# Fit again.
classifier.fit(x=features, y=labels, batch_size=2)
```

Preprocessed integer data.
```python
features = {
    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
}
labels = [0, 3]

# Pretrained classifier without preprocessing.
classifier = keras_hub.models.RobertaClassifier.from_preset(
    "roberta_base_en",
    num_classes=4,
    preprocessor=None,
)
classifier.fit(x=features, y=labels, batch_size=2)
```

## Example Usage with Hugging Face URI

```python
import keras
import keras_hub
import numpy as np
```

Raw string data.
```python
features = ["The quick brown fox jumped.", "I forgot my homework."]
labels = [0, 3]

# Pretrained classifier.
classifier = keras_hub.models.RobertaClassifier.from_preset(
    "hf://keras/roberta_base_en",
    num_classes=4,
)
classifier.fit(x=features, y=labels, batch_size=2)
classifier.predict(x=features, batch_size=2)

# Re-compile (e.g., with a new learning rate).
classifier.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(5e-5),
    jit_compile=True,
)
# Access backbone programmatically (e.g., to change `trainable`).
classifier.backbone.trainable = False
# Fit again.
classifier.fit(x=features, y=labels, batch_size=2)
```

Preprocessed integer data.
```python
features = {
    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
}
labels = [0, 3]

# Pretrained classifier without preprocessing.
classifier = keras_hub.models.RobertaClassifier.from_preset(
    "hf://keras/roberta_base_en",
    num_classes=4,
    preprocessor=None,
)
classifier.fit(x=features, y=labels, batch_size=2)
```