TimeRobber
commited on
Commit
·
b954a57
1
Parent(s):
6f9f453
Update README.md
Browse files
README.md
CHANGED
@@ -175,8 +175,6 @@ Multilingual model capable of following user instructions in a variety of langua
|
|
175 |
<details>
|
176 |
<summary>Click to expand</summary>
|
177 |
|
178 |
-
Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
|
179 |
-
|
180 |
### Model Architecture and Objective
|
181 |
|
182 |
* Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
|
@@ -187,11 +185,14 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
|
|
187 |
|
188 |
### Compute infrastructure
|
189 |
|
190 |
-
|
191 |
-
|
192 |
-
|
193 |
-
|
194 |
-
|
|
|
|
|
|
|
195 |
|
196 |
#### Software
|
197 |
|
@@ -220,7 +221,7 @@ It was pretrained on mC4 and then finetuned on xP3, P3 or xP3mt.
|
|
220 |
## Speeds, Sizes, Times
|
221 |
|
222 |
// TODO @adarob: Maybe we can push tensorboard on this repo as well
|
223 |
-
Training logs:
|
224 |
|
225 |
- Checkpoint size:
|
226 |
|
@@ -228,9 +229,6 @@ Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/
|
|
228 |
|
229 |
- Number of epochs: 1
|
230 |
|
231 |
-
// TODO @adarob: Can you share where the server is?
|
232 |
-
- Server training location:
|
233 |
-
|
234 |
|
235 |
## Environmental Impact
|
236 |
|
@@ -269,7 +267,7 @@ print(tokenizer.decode(outputs[0]))
|
|
269 |
|
270 |
## Intended Use
|
271 |
|
272 |
-
This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further
|
273 |
|
274 |
### Direct Use
|
275 |
|
|
|
175 |
<details>
|
176 |
<summary>Click to expand</summary>
|
177 |
|
|
|
|
|
178 |
### Model Architecture and Objective
|
179 |
|
180 |
* Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
|
|
|
185 |
|
186 |
### Compute infrastructure
|
187 |
|
188 |
+
Models were finetuned on [TPUv4](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#tpu_v4):
|
189 |
+
- `mt0-small` was finetuned on TPUv4-64
|
190 |
+
- `mt0-base` was finetuned on TPUv4-64
|
191 |
+
- `mt0-large` was finetuned on TPUv4-64
|
192 |
+
- `mt0-xl` was finetuned on TPUv4-128
|
193 |
+
- `mt0-xxl` was finetuned on TPUv4-256
|
194 |
+
- `mt0-mt-xxl` was finetuned on TPUv4-256
|
195 |
+
- `mt0-p3-xxl` was finetuned on TPUv4-256
|
196 |
|
197 |
#### Software
|
198 |
|
|
|
221 |
## Speeds, Sizes, Times
|
222 |
|
223 |
// TODO @adarob: Maybe we can push tensorboard on this repo as well
|
224 |
+
Training logs:
|
225 |
|
226 |
- Checkpoint size:
|
227 |
|
|
|
229 |
|
230 |
- Number of epochs: 1
|
231 |
|
|
|
|
|
|
|
232 |
|
233 |
## Environmental Impact
|
234 |
|
|
|
267 |
|
268 |
## Intended Use
|
269 |
|
270 |
+
This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further finetuned for specific tasks. Use cases below are not exhaustive.
|
271 |
|
272 |
### Direct Use
|
273 |
|