Text2Text Generation
Transformers
PyTorch
Safetensors
mt5
Eval Results
Inference Endpoints
TimeRobber commited on
Commit
b954a57
·
1 Parent(s): 6f9f453

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -12
README.md CHANGED
@@ -175,8 +175,6 @@ Multilingual model capable of following user instructions in a variety of langua
175
  <details>
176
  <summary>Click to expand</summary>
177
 
178
- Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
179
-
180
  ### Model Architecture and Objective
181
 
182
  * Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
@@ -187,11 +185,14 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
187
 
188
  ### Compute infrastructure
189
 
190
- // TODO @adarob: Can you describe where you trained it?
191
-
192
- #### Hardware
193
-
194
- // TODO @adarob: Can you describe what was the hardware used?
 
 
 
195
 
196
  #### Software
197
 
@@ -220,7 +221,7 @@ It was pretrained on mC4 and then finetuned on xP3, P3 or xP3mt.
220
  ## Speeds, Sizes, Times
221
 
222
  // TODO @adarob: Maybe we can push tensorboard on this repo as well
223
- Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11-176B-ml-logs/)
224
 
225
  - Checkpoint size:
226
 
@@ -228,9 +229,6 @@ Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/
228
 
229
  - Number of epochs: 1
230
 
231
- // TODO @adarob: Can you share where the server is?
232
- - Server training location:
233
-
234
 
235
  ## Environmental Impact
236
 
@@ -269,7 +267,7 @@ print(tokenizer.decode(outputs[0]))
269
 
270
  ## Intended Use
271
 
272
- This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further fine-tuned for specific tasks. Use cases below are not exhaustive.
273
 
274
  ### Direct Use
275
 
 
175
  <details>
176
  <summary>Click to expand</summary>
177
 
 
 
178
  ### Model Architecture and Objective
179
 
180
  * Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
 
185
 
186
  ### Compute infrastructure
187
 
188
+ Models were finetuned on [TPUv4](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#tpu_v4):
189
+ - `mt0-small` was finetuned on TPUv4-64
190
+ - `mt0-base` was finetuned on TPUv4-64
191
+ - `mt0-large` was finetuned on TPUv4-64
192
+ - `mt0-xl` was finetuned on TPUv4-128
193
+ - `mt0-xxl` was finetuned on TPUv4-256
194
+ - `mt0-mt-xxl` was finetuned on TPUv4-256
195
+ - `mt0-p3-xxl` was finetuned on TPUv4-256
196
 
197
  #### Software
198
 
 
221
  ## Speeds, Sizes, Times
222
 
223
  // TODO @adarob: Maybe we can push tensorboard on this repo as well
224
+ Training logs:
225
 
226
  - Checkpoint size:
227
 
 
229
 
230
  - Number of epochs: 1
231
 
 
 
 
232
 
233
  ## Environmental Impact
234
 
 
267
 
268
  ## Intended Use
269
 
270
+ This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further finetuned for specific tasks. Use cases below are not exhaustive.
271
 
272
  ### Direct Use
273