amezasor commited on
Commit
93b9d17
1 Parent(s): a7f86af

update after review

Browse files
Files changed (1) hide show
  1. README.md +20 -25
README.md CHANGED
@@ -2,9 +2,6 @@
2
  pipeline_tag: text-generation
3
  inference: false
4
  license: apache-2.0
5
- # datasets:
6
- # metrics:
7
- # - code_eval
8
  library_name: transformers
9
  tags:
10
  - language
@@ -204,29 +201,28 @@ model-index:
204
  veriefied: false
205
  ---
206
  <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
 
207
 
208
  # Granite-3.0-8B-Base
209
 
210
- ## Model Summary
211
- **Granite-3.0-8B-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-8B-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 10 trillion tokens sourced from diverse domains. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
212
 
213
  - **Developers:** IBM Research
214
  - **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
215
  - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
216
- - **Paper:** [Granite 3.0 Language Models]()
217
  - **Release Date**: October 21st, 2024
218
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
219
 
220
- <!-- de/es/fr/ja/pt/ar/cs/it/ko/nl/zh -->
221
- ## Supported Languages
222
- English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese
223
 
224
- ## Usage
225
- ### Intended use
226
  Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios.
227
 
228
- ### Generation
229
- This is a simple example of how to use **Granite-3.0-8B-Base** model.
230
 
231
  Install the following libraries:
232
 
@@ -258,8 +254,8 @@ output = tokenizer.batch_decode(output)
258
  print(output)
259
  ```
260
 
261
- ## Model Architeture
262
- **Granite-3.0-8B-Base** is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embbeddings.
263
 
264
  | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
265
  | :-------- | :--------| :-------- | :------| :------|
@@ -279,19 +275,18 @@ print(output)
279
  | # Active Parameters | 2.5B | **8.1B** | 400M | 800M |
280
  | # Training tokens | 12T | **12T** | 10T | 10T |
281
 
282
- <!-- TO DO: To be completed once the paper is ready -->
283
- ## Training Data
284
- This model is trained on a mix of open-source and proprietary data following a two-phase training strategy.
285
- * Phase 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
286
- * Phase 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
287
 
288
- ## Infrastructure
289
- We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
290
 
291
- ## Ethical Considerations and Limitations
292
  The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. **Granite-3.0-8B-Base** model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use **Granite-3.0-8B-Base** model with ethical intentions and in a responsible way.
293
 
294
- ## Citation
295
  ```
296
  @misc{granite-models,
297
  author = {author 1, author2, ...},
@@ -301,4 +296,4 @@ The use of Large Language Models involves risks and ethical considerations peopl
301
  year = {2024},
302
  url = {https://arxiv.org/abs/0000.00000},
303
  }
304
- ```
 
2
  pipeline_tag: text-generation
3
  inference: false
4
  license: apache-2.0
 
 
 
5
  library_name: transformers
6
  tags:
7
  - language
 
201
  veriefied: false
202
  ---
203
  <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
204
+ <!-- ![image/png](granite-3_0-language-models_Group_1.png) -->
205
 
206
  # Granite-3.0-8B-Base
207
 
208
+ **Model Summary:**
209
+ Granite-3.0-8B-Base is a decoder-only language model to support a variety of text-to-text generation tasks. It is trained from scratch following a two-stage training strategy. In the first stage, it is trained on 10 trillion tokens sourced from diverse domains. During the second stage, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
210
 
211
  - **Developers:** IBM Research
212
  - **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
213
  - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
214
+ - **Paper:** [Granite 3.0 Language Models](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/granite-3-language-models.pdf)
215
  - **Release Date**: October 21st, 2024
216
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
217
 
218
+ **Supported Languages:**
219
+ English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fintune Granite 3.0 models for languages beyond these 12 languages.
 
220
 
221
+ **Intended use:**
 
222
  Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering, and more. All Granite Base models are able to handle these tasks as they were trained on a large amount of data from various domains. Moreover, they can serve as baseline to create specialized models for specific application scenarios.
223
 
224
+ **Generation:**
225
+ This is a simple example of how to use Granite-3.0-8B-Base model.
226
 
227
  Install the following libraries:
228
 
 
254
  print(output)
255
  ```
256
 
257
+ **Model Architeture:**
258
+ Granite-3.0-8B-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embbeddings.
259
 
260
  | Model | 2B Dense | 8B Dense | 1B MoE | 3B MoE |
261
  | :-------- | :--------| :-------- | :------| :------|
 
275
  | # Active Parameters | 2.5B | **8.1B** | 400M | 800M |
276
  | # Training tokens | 12T | **12T** | 10T | 10T |
277
 
278
+ **Training Data:**
279
+ This model is trained on a mix of open source and proprietary data following a two-phase training strategy.
280
+ * Stage 1 data: The data for phase 1 is sourced from diverse domains, such as: web, code, academic sources, books, and math data.
281
+ * Stage 2 data: The data for phase 2 comprises a curated mix of high-quality data from the same domains, plus multilingual and instruction data. The goal of this second training phase is to enhance the model’s performance on specific tasks.
 
282
 
283
+ **Infrastructure:**
284
+ We train Granite 3.0 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
285
 
286
+ **Ethical Considerations and Limitations:**
287
  The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. **Granite-3.0-8B-Base** model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment, there it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use **Granite-3.0-8B-Base** model with ethical intentions and in a responsible way.
288
 
289
+ <!-- ## Citation
290
  ```
291
  @misc{granite-models,
292
  author = {author 1, author2, ...},
 
296
  year = {2024},
297
  url = {https://arxiv.org/abs/0000.00000},
298
  }
299
+ ``` -->