tokyo-electron-device-ai
commited on
Commit
•
7277056
1
Parent(s):
bd5bcbd
Update README.md
Browse files
README.md
CHANGED
@@ -75,7 +75,7 @@ Note: This released model was trained exclusively on open-source datasets. We al
|
|
75 |
* **weight_decay**: 0.1
|
76 |
* **annealing_steps**: 500
|
77 |
|
78 |
-
Note: We created another model name, llama3-tedllm-8b-v0-annealing as the model with the annealing_step applied.
|
79 |
|
80 |
### Training Infrastructure
|
81 |
The model was trained on a Cerebras Wafer-Scale Cluster, using from 4 to 16 CS-3 systems during different phases of training. Training on the Cerebras Wafer-Scale Clusters leverages Cerebras' Weight Streaming execution paradigm, which simplifies the training of LLMs by disaggregating compute from memory used for model weights. This enables efficient scaling of training across multiple nodes using simple data parallelism. You can learn more about Cerebras Wafer-Scale clusters and Weight Streaming execution [here](https://8968533.fs1.hubspotusercontent-na1.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper.pdf).
|
|
|
75 |
* **weight_decay**: 0.1
|
76 |
* **annealing_steps**: 500
|
77 |
|
78 |
+
Note: We created another model name, llama3-tedllm-8b-v0-annealing as the model with the annealing_step applied. If you are interested, please check [here](https://huggingface.co/tokyo-electron-device-ai/llama3-tedllm-8b-v0-annealing).
|
79 |
|
80 |
### Training Infrastructure
|
81 |
The model was trained on a Cerebras Wafer-Scale Cluster, using from 4 to 16 CS-3 systems during different phases of training. Training on the Cerebras Wafer-Scale Clusters leverages Cerebras' Weight Streaming execution paradigm, which simplifies the training of LLMs by disaggregating compute from memory used for model weights. This enables efficient scaling of training across multiple nodes using simple data parallelism. You can learn more about Cerebras Wafer-Scale clusters and Weight Streaming execution [here](https://8968533.fs1.hubspotusercontent-na1.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper.pdf).
|