Divyasreepat commited on
Commit
0fc9457
1 Parent(s): 4b85afc

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +104 -15
README.md CHANGED
@@ -1,18 +1,107 @@
1
  ---
2
  library_name: keras-hub
3
  ---
4
- This is a [`Whisper` model](https://keras.io/api/keras_hub/models/whisper) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
5
- Model config:
6
- * **name:** whisper_backbone
7
- * **trainable:** True
8
- * **vocabulary_size:** 51865
9
- * **num_layers:** 4
10
- * **num_heads:** 6
11
- * **hidden_dim:** 384
12
- * **intermediate_dim:** 1536
13
- * **num_mels:** 80
14
- * **dropout:** 0.0
15
- * **max_encoder_sequence_length:** 3000
16
- * **max_decoder_sequence_length:** 448
17
-
18
- This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: keras-hub
3
  ---
4
+ ### Model Overview
5
+ ⚠️ Whisper is currently only available via the `keras-hub-nightly` package. Use `pip install keras-hub-nightly` to try this model.
6
+
7
+ A Whisper encoder-decoder network for speech.
8
+
9
+ This class implements a Transformer-based encoder-decoder model as
10
+ described in
11
+ ["Robust Speech Recognition via Large-Scale Weak Supervision"](https://arxiv.org/abs/2212.04356).
12
+ It includes the embedding lookups and transformer layers, but not the head
13
+ for predicting the next token.
14
+
15
+ The default constructor gives a fully customizable, randomly initialized Whisper
16
+ model with any number of layers, heads, and embedding dimensions. To load
17
+ preset architectures and weights, use the `from_preset()` constructor.
18
+
19
+ Disclaimer: Pre-trained models are provided on an "as is" basis, without
20
+ warranties or conditions of any kind. The underlying model is provided by a
21
+ third party and subject to a separate license, available
22
+ [here](https://github.com/openai/whisper).
23
+
24
+
25
+ __Arguments__
26
+
27
+
28
+ - __vocabulary_size__: int. The size of the token vocabulary.
29
+ - __num_layers__: int. The number of transformer encoder layers and
30
+ transformer decoder layers.
31
+ - __num_heads__: int. The number of attention heads for each transformer.
32
+ The hidden size must be divisible by the number of attention heads.
33
+ - __hidden_dim__: int. The size of the transformer encoding and pooler layers.
34
+ - __intermediate_dim__: int. The output dimension of the first Dense layer in
35
+ a two-layer feedforward network for each transformer.
36
+ - __num_mels__: int. The number of mel-frequency filters. Defaults to `80`.
37
+ - __dropout__: float. Dropout probability for the Transformer encoder.
38
+ - __max_encoder_sequence_length__: int. The maximum sequence length that the
39
+ audio encoder can consume. Since the second convolutional layer in
40
+ the encoder reduces the sequence length by half (stride of 2), we
41
+ use `max_encoder_sequence_length // 2` as the sequence length for the
42
+ positional embedding layer.
43
+ - __max_decoder_sequence_length__: int. The maximum sequence length that the
44
+ text decoder can consume.
45
+
46
+ ### Example Usage
47
+ ```python
48
+ import keras_hub
49
+ import keras_core as keras
50
+ import numpy as np
51
+ ```
52
+
53
+
54
+
55
+ ```python
56
+ input_data = {
57
+ "encoder_features": np.ones(shape=(1, 12, 80), dtype="int32"),
58
+ "decoder_token_ids": np.ones(shape=(1, 12), dtype="int32"),
59
+ "decoder_padding_mask": np.array(
60
+ [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]]
61
+ ),
62
+ }
63
+
64
+ # Randomly initialized Whisper encoder-decoder model with a custom config.
65
+ model = keras_hub.models.WhisperBackbone(
66
+ vocabulary_size=51864,
67
+ num_layers=4,
68
+ num_heads=4,
69
+ hidden_dim=256,
70
+ intermediate_dim=512,
71
+ max_encoder_sequence_length=128,
72
+ max_decoder_sequence_length=128,
73
+ )
74
+ model(input_data)
75
+ ```
76
+
77
+ ## Example Usage with Hugging Face URI
78
+
79
+ ```python
80
+ import keras_hub
81
+ import keras_core as keras
82
+ import numpy as np
83
+ ```
84
+
85
+
86
+
87
+ ```python
88
+ input_data = {
89
+ "encoder_features": np.ones(shape=(1, 12, 80), dtype="int32"),
90
+ "decoder_token_ids": np.ones(shape=(1, 12), dtype="int32"),
91
+ "decoder_padding_mask": np.array(
92
+ [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]]
93
+ ),
94
+ }
95
+
96
+ # Randomly initialized Whisper encoder-decoder model with a custom config.
97
+ model = keras_hub.models.WhisperBackbone(
98
+ vocabulary_size=51864,
99
+ num_layers=4,
100
+ num_heads=4,
101
+ hidden_dim=256,
102
+ intermediate_dim=512,
103
+ max_encoder_sequence_length=128,
104
+ max_decoder_sequence_length=128,
105
+ )
106
+ model(input_data)
107
+ ```