Tijmen2 commited on
Commit
93d15e4
1 Parent(s): e6aba66

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -28
README.md CHANGED
@@ -106,25 +106,6 @@ When using one of the quantized versions, make sure to pass the quantization con
106
  }
107
  ```
108
 
109
- ## Standard evaluations
110
-
111
- cosmosage can be compared to OpenHermes-2.5-Mistral-7B using standard evaluation metrics.
112
-
113
- | Test Category | cosmosage_v2 | OpenHermes-2.5-Mistral-7B |
114
- |---------------|-------------------------|------------------------------------|
115
- | Overall | 0.595 | 0.632 |
116
- | ARC Challenge | 0.565 | 0.613 |
117
- | Hellaswag | 0.619 | 0.652 |
118
- | TruthfulQA:mc1 | 0.348 | 0.361 |
119
- | TruthfulQA:mc2 | 0.510 | 0.522 |
120
- | Winogrande | 0.759 | 0.781 |
121
- | GSM8k | 0.368 | 0.261 |
122
-
123
- cosmosage_v2 performs only slightly below OpenHermes-2.5-Mistral-7B on most metrics, indicating that the
124
- heavy specialization in cosmology has not come at much of a cost on general-purpose abilities. The exception
125
- is GSM8k, which is a collection of grade school math problems. Here, cosmosage performs significantly better
126
- than OpenHermes-2.5-Mistral-7B.
127
-
128
  ## Instruction format
129
 
130
  cosmosage_v2 was trained with the "inst" chat template as implemented in axolotl v0.4.0. This resulted in an
@@ -184,15 +165,22 @@ unusual instruction format:
184
  > In summary, the time of matter-radiation equality affects the damping tail of the CMB power spectrum by influencing the amount of time that photons spend in the diffusive state before they are able to decouple from the matter and travel freely through the universe. The longer the photons spend in the diffusive state, the more damping occurs, and the earlier matter-radiation equality occurs, the less damping occurs.>
185
 
186
  # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
 
187
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Tijmen2__cosmosage_v2)
188
 
189
- | Metric |Value|
190
- |---------------------------------|----:|
191
- |Avg. |60.66|
192
- |AI2 Reasoning Challenge (25-Shot)|59.73|
193
- |HellaSwag (10-Shot) |80.90|
194
- |MMLU (5-Shot) |59.57|
195
- |TruthfulQA (0-shot) |50.98|
196
- |Winogrande (5-shot) |75.93|
197
- |GSM8k (5-shot) |36.85|
 
 
198
 
 
 
 
 
 
106
  }
107
  ```
108
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  ## Instruction format
110
 
111
  cosmosage_v2 was trained with the "inst" chat template as implemented in axolotl v0.4.0. This resulted in an
 
165
  > In summary, the time of matter-radiation equality affects the damping tail of the CMB power spectrum by influencing the amount of time that photons spend in the diffusive state before they are able to decouple from the matter and travel freely through the universe. The longer the photons spend in the diffusive state, the more damping occurs, and the earlier matter-radiation equality occurs, the less damping occurs.>
166
 
167
  # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
168
+
169
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Tijmen2__cosmosage_v2)
170
 
171
+ | Metric |Value|OpenHermes2.5-Mistral7B|
172
+ |---------------------------------|----:|----------------------:|
173
+ |Avg. |60.66|61.52|
174
+ |AI2 Reasoning Challenge (25-Shot)|59.73|64.93|
175
+ |HellaSwag (10-Shot) |80.90|84.18|
176
+ |MMLU (5-Shot) |59.57|63.64|
177
+ |TruthfulQA (0-shot) |50.98|52.24|
178
+ |Winogrande (5-shot) |75.93|78.06|
179
+ |GSM8k (5-shot) |36.85|26.08|
180
+
181
+ cosmosage_v2 can be compared to OpenHermes-2.5-Mistral-7B because it started from the same base model and also trained on the OpenHermes2.5 dataset.
182
 
183
+ cosmosage_v2 performs only slightly below OpenHermes-2.5-Mistral-7B on most metrics, indicating that the
184
+ heavy specialization in cosmology has not come at much of a cost on general-purpose abilities. The exception
185
+ is GSM8k, which is a collection of grade school math problems. Here, cosmosage performs significantly better
186
+ than OpenHermes-2.5-Mistral-7B.