rpand002 commited on
Commit
bbd90f1
·
verified ·
1 Parent(s): dd9ce5a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
  # Granite-3.1-1B-A400M-Base
12
 
13
  **Model Summary:**
14
- Granite-3.1-1B-A400M-Base extends the context length of Granite-3.0-1B-A400M-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K. We trained on approximately xxB tokens total for all stages, which is only 0.xx% of total pre-training data.
15
 
16
  - **Developers:** Granite Team, IBM
17
  - **GitHub Repository:** [ibm-granite/granite-3.1-language-models](https://github.com/ibm-granite/granite-3.1-language-models)
 
11
  # Granite-3.1-1B-A400M-Base
12
 
13
  **Model Summary:**
14
+ Granite-3.1-1B-A400M-Base extends the context length of Granite-3.0-1B-A400M-Base from 4K to 128K using a progressive training strategy by increasing the supported context length in increments while adjusting RoPE theta until the model has successfully adapted to desired length of 128K. This long-context pre-training stage was performed using approximately 500B tokens.
15
 
16
  - **Developers:** Granite Team, IBM
17
  - **GitHub Repository:** [ibm-granite/granite-3.1-language-models](https://github.com/ibm-granite/granite-3.1-language-models)