LLM360
/

Crystal

@@ -23,12 +23,20 @@ By comparing CrystalCoder with other similar work, CrystalCoder is quite balance
 |        Model        | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. |  ARC  | HellaSwag | MMLU (5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
 |:-------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:----------:|:------------------:|:-------------:|
 | Mistral 7B          | -              | 48.68        | 62.40         | 33.95       | 59.98 | 83.31     | 64.16         | 42.15      | 29.12              | 38.78         |
-| **CrystalCoder 7B** | 1.4T           | 41.65        | 50.92         | 32.38       | 47.01 | 71.97     | 48.78         | 35.91      | 28.38              | 36.38         |
-| CodeLlaMA 7B        | 2.5T           | 39.94        | 42.42         | 37.45       | 39.93 | 60.80     | 31.12         | 37.82      | 33.50              | 41.40         |
 | OpenLLaMA v2 7B     | 1T             | 38.10        | 48.18         | 28.01       | 43.60 | 72.20     | 41.29         | 35.54      | 15.32              | 12.69         |
 | LLaMA 2 7B          | 2T             | 34.98        | 53.39         | 16.57       | 53.07 | 77.74     | 43.80         | 38.98      | 13.05              | 20.09         |
 | StarCoder-15B       | 1.03           | -            | -             | 38.46       | -     | -         | -             | -          | 33.63              | 43.28         |
 ## About LLM360
 LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
 where all training details, model checkpoints, intermediate results, and

 |        Model        | Trained Tokens | Avg. of Avg. | Language Avg. | Coding Avg. |  ARC  | HellaSwag | MMLU (5-shot) | TruthfulQA | HumanEval (pass@1) | MBPP (pass@1) |
 |:-------------------:|:--------------:|:------------:|:-------------:|:-----------:|:-----:|:---------:|:-------------:|:----------:|:------------------:|:-------------:|
 | Mistral 7B          | -              | 48.68        | 62.40         | 33.95       | 59.98 | 83.31     | 64.16         | 42.15      | 29.12              | 38.78         |
+| **CrystalCoder 7B** | 1.27T           | 41.65        | 50.92         | 32.38       | 47.44 | 74.38     | 48.42         | 36.46      | 23.90 | 30.988  |
+| **CrystalCoder 7B Python/Web** | 1.4T           | 41.65        | 50.92         | 32.38       | 47.01 | 71.97     | 48.78         | 35.91      | 28.38  | 36.38  |
+| CodeLlaMA 7B Base        | 2.5T           | 40.24        | 46.16         | 34.32       | 42.75 | 64.74     | 39.98         | 37.19      | 30.06     | 38.573         |
+| CodeLlaMA 7B - Python | 2.6T           | 40.09        | 42.42         | 37.76       | 39.93 | 60.80     | 31.12         | 37.82      | 34.12              | 41.40         |
 | OpenLLaMA v2 7B     | 1T             | 38.10        | 48.18         | 28.01       | 43.60 | 72.20     | 41.29         | 35.54      | 15.32              | 12.69         |
 | LLaMA 2 7B          | 2T             | 34.98        | 53.39         | 16.57       | 53.07 | 77.74     | 43.80         | 38.98      | 13.05              | 20.09         |
 | StarCoder-15B       | 1.03           | -            | -             | 38.46       | -     | -         | -             | -          | 33.63              | 43.28         |
+** Notes **
+- For detailed token breakdown of CrystalCoder dataset, refer to the [CrystalCoder dataset repository](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets).
+- Scores for HumanEval is computed with a temporature of 0.2
+- Scores for MBPP is computed with a temperature of 0.1
 ## About LLM360
 LLM360 is an initiative for comprehensive and fully open-sourced LLMs,
 where all training details, model checkpoints, intermediate results, and