stas commited on
Commit
ce9d189
1 Parent(s): 9ad1d39

small fixes

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -56,7 +56,7 @@ Training a multilingual 176 billion parameters model in the open
56
 
57
  [BigScience](https://bigscience.huggingface.co) is a open and collaborative workshop around the study and creation of very large language models gathering more than 1000 researchers around the worlds. You can find more information on the main website at https://bigscience.huggingface.co.
58
 
59
- The training of BigScience’s main model started on **March 11, 2022 11:42am PST** and will last 3-4 months on the 416 A100 GPUs of the Jean Zay public supercomputer
60
 
61
  You can follow the training at [https://twitter.com/BigScienceLLM](https://twitter.com/BigScienceLLM)
62
 
@@ -75,16 +75,16 @@ You can follow the training at [https://twitter.com/BigScienceLLM](https://twitt
75
 
76
  - Multilingual: 46 languages: Full list is here: [https://bigscience.huggingface.co/blog/building-a-tb-scale-multilingual-dataset-for-language-modeling](https://bigscience.huggingface.co/blog/building-a-tb-scale-multilingual-dataset-for-language-modeling)
77
  - 341.6 billion tokens (1.5 TB of text data)
78
- - Tokenizer vocabulary: 250 680 tokens
79
  - More information:
80
  - Blog post detailing the design choices during the dataset creation: [https://bigscience.huggingface.co/blog/building-a-tb-scale-multilingual-dataset-for-language-modeling](https://bigscience.huggingface.co/blog/building-a-tb-scale-multilingual-dataset-for-language-modeling)
81
 
82
  ### **The engineering side**
83
 
84
- - number of GPU used for the training: 384 A100 GPU with 80 Gb of memory each
85
  - one copy of the model takes 48 GPUs (using 60 GB of memory on each GPU)
86
- - checkpoint size: only the bf16 weights are 329GB, the full checkpoint with optimizer states is 2.3TB
87
- - training throughput: about 150 TFLOPs
88
  - estimated training time: 3-4 months depending on throughput and unexpected events
89
  - **More information**:
90
  - Blog post on the hardware/engineering side: [https://bigscience.huggingface.co/blog/which-hardware-to-train-a-176b-parameters-model](https://bigscience.huggingface.co/blog/which-hardware-to-train-a-176b-parameters-model)
 
56
 
57
  [BigScience](https://bigscience.huggingface.co) is a open and collaborative workshop around the study and creation of very large language models gathering more than 1000 researchers around the worlds. You can find more information on the main website at https://bigscience.huggingface.co.
58
 
59
+ The training of BigScience’s main model started on **March 11, 2022 11:42am PST** and will continue for 3-4 months on 384 A100 80GB GPUs of the Jean Zay public supercomputer
60
 
61
  You can follow the training at [https://twitter.com/BigScienceLLM](https://twitter.com/BigScienceLLM)
62
 
 
75
 
76
  - Multilingual: 46 languages: Full list is here: [https://bigscience.huggingface.co/blog/building-a-tb-scale-multilingual-dataset-for-language-modeling](https://bigscience.huggingface.co/blog/building-a-tb-scale-multilingual-dataset-for-language-modeling)
77
  - 341.6 billion tokens (1.5 TB of text data)
78
+ - Tokenizer vocabulary: 250,680 tokens
79
  - More information:
80
  - Blog post detailing the design choices during the dataset creation: [https://bigscience.huggingface.co/blog/building-a-tb-scale-multilingual-dataset-for-language-modeling](https://bigscience.huggingface.co/blog/building-a-tb-scale-multilingual-dataset-for-language-modeling)
81
 
82
  ### **The engineering side**
83
 
84
+ - number of GPU used for the training: 384 A100 GPU with 80 GB of memory each
85
  - one copy of the model takes 48 GPUs (using 60 GB of memory on each GPU)
86
+ - checkpoint size: the bf16 weights are 329GB, the full checkpoint with optimizer states is 2.3TB
87
+ - training throughput: ~150 TFLOPs
88
  - estimated training time: 3-4 months depending on throughput and unexpected events
89
  - **More information**:
90
  - Blog post on the hardware/engineering side: [https://bigscience.huggingface.co/blog/which-hardware-to-train-a-176b-parameters-model](https://bigscience.huggingface.co/blog/which-hardware-to-train-a-176b-parameters-model)