tokyo-electron-device-ai commited on
Commit
7277056
1 Parent(s): bd5bcbd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -75,7 +75,7 @@ Note: This released model was trained exclusively on open-source datasets. We al
75
  * **weight_decay**: 0.1
76
  * **annealing_steps**: 500
77
 
78
- Note: We created another model name, llama3-tedllm-8b-v0-annealing as the model with the annealing_step applied. Please check [Here](https://huggingface.co/tokyo-electron-device-ai/llama3-tedllm-8b-v0-annealing) if you can use.
79
 
80
  ### Training Infrastructure
81
  The model was trained on a Cerebras Wafer-Scale Cluster, using from 4 to 16 CS-3 systems during different phases of training. Training on the Cerebras Wafer-Scale Clusters leverages Cerebras' Weight Streaming execution paradigm, which simplifies the training of LLMs by disaggregating compute from memory used for model weights. This enables efficient scaling of training across multiple nodes using simple data parallelism. You can learn more about Cerebras Wafer-Scale clusters and Weight Streaming execution [here](https://8968533.fs1.hubspotusercontent-na1.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper.pdf).
 
75
  * **weight_decay**: 0.1
76
  * **annealing_steps**: 500
77
 
78
+ Note: We created another model name, llama3-tedllm-8b-v0-annealing as the model with the annealing_step applied. If you are interested, please check [here](https://huggingface.co/tokyo-electron-device-ai/llama3-tedllm-8b-v0-annealing).
79
 
80
  ### Training Infrastructure
81
  The model was trained on a Cerebras Wafer-Scale Cluster, using from 4 to 16 CS-3 systems during different phases of training. Training on the Cerebras Wafer-Scale Clusters leverages Cerebras' Weight Streaming execution paradigm, which simplifies the training of LLMs by disaggregating compute from memory used for model weights. This enables efficient scaling of training across multiple nodes using simple data parallelism. You can learn more about Cerebras Wafer-Scale clusters and Weight Streaming execution [here](https://8968533.fs1.hubspotusercontent-na1.net/hubfs/8968533/Virtual%20Booth%20Docs/CS%20Weight%20Streaming%20White%20Paper.pdf).