ylacombe commited on
Commit
1dbd7a1
β€’
1 Parent(s): 50f0cdb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md CHANGED
@@ -67,6 +67,8 @@ Try out Bark yourself!
67
  </a>
68
 
69
 
 
 
70
  You can run Bark locally with the πŸ€— Transformers library from version 4.31.0 onwards.
71
 
72
  1. First install the πŸ€— [Transformers library](https://github.com/huggingface/transformers) and scipy:
@@ -125,6 +127,49 @@ scipy.io.wavfile.write("bark_out.wav", rate=sampling_rate, data=speech_values.cp
125
 
126
  For more details on using the Bark model for inference using the πŸ€— Transformers library, refer to the [Bark docs](https://huggingface.co/docs/transformers/model_doc/bark).
127
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128
  ## Suno Usage
129
 
130
  You can also run Bark locally through the original [Bark library]((https://github.com/suno-ai/bark):
 
67
  </a>
68
 
69
 
70
+ ## πŸ€— Transformers Usage
71
+
72
  You can run Bark locally with the πŸ€— Transformers library from version 4.31.0 onwards.
73
 
74
  1. First install the πŸ€— [Transformers library](https://github.com/huggingface/transformers) and scipy:
 
127
 
128
  For more details on using the Bark model for inference using the πŸ€— Transformers library, refer to the [Bark docs](https://huggingface.co/docs/transformers/model_doc/bark).
129
 
130
+ ### Optimization tips
131
+
132
+ Refers to this [blog post](https://huggingface.co/blog/optimizing-bark#benchmark-results) to find out more about the following methods and a benchmark of their benefits.
133
+
134
+ #### Get significant speed-ups:
135
+
136
+ **Using πŸ€— Better Transformer**
137
+
138
+ Better Transformer is an πŸ€— Optimum feature that performs kernel fusion under the hood. You can gain 20% to 30% in speed with zero performance degradation. It only requires one line of code to export the model to πŸ€— Better Transformer:
139
+ ```python
140
+ model = model.to_bettertransformer()
141
+ ```
142
+ Note that πŸ€— Optimum must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/optimum/installation)
143
+
144
+ **Using Flash Attention 2**
145
+
146
+ Flash Attention 2 is an even faster, optimized version of the previous optimization.
147
+ ```python
148
+ model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16, use_flash_attention_2=True).to(device)
149
+ ```
150
+ Make sure to load your model in half-precision (e.g. `torch.float16``) and to [install](https://github.com/Dao-AILab/flash-attention#installation-and-features) the latest version of Flash Attention 2.
151
+
152
+ **Note:** Flash Attention 2 is only available on newer GPUs, refer to πŸ€— Better Transformer in case your GPU don't support it.
153
+
154
+ #### Reduce memory footprint:
155
+
156
+ **Using half-precision**
157
+
158
+ You can speed up inference and reduce memory footprint by 50% simply by loading the model in half-precision (e.g. `torch.float16``).
159
+
160
+ **Using CPU offload**
161
+
162
+ Bark is made up of 4 sub-models, which are called up sequentially during audio generation. In other words, while one sub-model is in use, the other sub-models are idle.
163
+
164
+ If you're using a CUDA device, a simple solution to benefit from an 80% reduction in memory footprint is to offload the GPU's submodels when they're idle. This operation is called CPU offloading. You can use it with one line of code.
165
+
166
+ ```python
167
+ model.enable_cpu_offload()
168
+ ```
169
+ Note that πŸ€— Accelerate must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/accelerate/basic_tutorials/install)
170
+
171
+
172
+
173
  ## Suno Usage
174
 
175
  You can also run Bark locally through the original [Bark library]((https://github.com/suno-ai/bark):