ibm-granite
/

granite-3.0-1b-a400m-base

@@ -95,7 +95,7 @@ model-index:
   - task:
       type: text-generation
     dataset:
-        type: reading-comprehension
         name: TruthfulQA
     metrics:
     - name: pass@1
@@ -106,6 +106,26 @@ model-index:
       type: text-generation
     dataset:
         type: reading-comprehension
         name: ARC-C
     metrics:
     - name: pass@1
@@ -115,7 +135,7 @@ model-index:
   - task:
       type: text-generation
     dataset:
-        type: reading-comprehension
         name: GPQA
     metrics:
     - name: pass@1
@@ -125,7 +145,7 @@ model-index:
   - task:
       type: text-generation
     dataset:
-        type: reading-comprehension
         name: BBH
     metrics:
     - name: pass@1
@@ -184,24 +204,12 @@ model-index:
       veriefied: false
 ---
 <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
-<!-- ![image/png](granite-3_0-language-models_Group_1.png) -->
 # Granite-3.0-1B-A400M-Base
 ## Model Summary
 **Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains, including natural language, math, code, and safety. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
-<!-- **Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). The particular characteristics of this model, includig a Mixture of Experts(MoE) architecture, small size, and open-source nature, make it an ideal baseline for finetuning other models that require large model capacity while maintaining computational efficiency. **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains, including natural language, math, code, and safety. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks. -->
-<!-- Use Cases:
-Dense LLMs: Suitable for scenarios where fast inference with a smaller model size is prioritized, such as real-time applications or deployment on resource-constrained devices.
-MoE LLMs: Ideal for situations where large model capacity is needed while maintaining computational efficiency, like handling complex tasks or large datasets with high computational demands -->
-<!-- ====Features==== -->
-<!-- MoE will be faster
-Demployment resources (memory): same -->
 - **Developers:** IBM Research
 - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
 - **Paper:** [Granite Language Models](https://) <!--     TO DO: Update github repo ling whe it is ready -->
@@ -273,7 +281,6 @@ print(output)
 ## Training Data
 This model is trained on a mix of open-source and proprietary datasets.
-<!-- CHECK: removed Vela, only talk about blue-vela-->
 ## Infrastructure
 We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.

   - task:
       type: text-generation
     dataset:
+        type: commonsense
         name: TruthfulQA
     metrics:
     - name: pass@1
       type: text-generation
     dataset:
         type: reading-comprehension
+        name: BoolQ
+    metrics:
+    - name: pass@1
+      type: pass@1
+      value: 65.44
+      veriefied: false
+  - task:
+      type: text-generation
+    dataset:
+        type: reading-comprehension
+        name: SQuAD v2
+    metrics:
+    - name: pass@1
+      type: pass@1
+      value: 17.78
+      veriefied: false
+  - task:
+      type: text-generation
+    dataset:
+        type: reasoning
         name: ARC-C
     metrics:
     - name: pass@1
   - task:
       type: text-generation
     dataset:
+        type: reasoning
         name: GPQA
     metrics:
     - name: pass@1
   - task:
       type: text-generation
     dataset:
+        type: reasoning
         name: BBH
     metrics:
     - name: pass@1
       veriefied: false
 ---
 <!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cd5057674cdb524450093d/1hzxoPwqkBJXshKVVe6_9.png) -->
 # Granite-3.0-1B-A400M-Base
 ## Model Summary
 **Granite-3.0-1B-A400M-Base** is an open-source decoder-only language model from IBM Research that supports a variety of text-to-text generation tasks (e.g., question-answering, text-completion). **Granite-3.0-1B-A400M-Base** is trained from scratch and follows a two-phase training strategy. In the first phase, it is trained on 8 trillion tokens sourced from diverse domains, including natural language, math, code, and safety. During the second phase, it is further trained on 2 trillion tokens using a carefully curated mix of high-quality data, aiming to enhance its performance on specific tasks.
 - **Developers:** IBM Research
 - **GitHub Repository:** [ibm-granite/granite-language-models](https://github.com/ibm-granite/granite-language-models)
 - **Paper:** [Granite Language Models](https://) <!--     TO DO: Update github repo ling whe it is ready -->
 ## Training Data
 This model is trained on a mix of open-source and proprietary datasets.
 ## Infrastructure
 We train the Granite Language models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.