Files changed (1) hide show
  1. README.md +55 -2
README.md CHANGED
@@ -67,8 +67,61 @@ python run_inference.py -m models/Falcon3-3B-Base-1.58bit/ggml-model-i2_s.gguf -
67
  ```
68
 
69
  # Evaluation
70
-
71
- Coming soon ..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  # Citation
74
 
 
67
  ```
68
 
69
  # Evaluation
70
+ We report in the following table our internal pipeline benchmarks:
71
+
72
+ **Note evaluation results are normalized score from v2 leaderboard tasks - reported results of original models in the blogpost are raw scores**
73
+
74
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
75
+ <colgroup>
76
+ <col style="width: 10%;">
77
+ <col style="width: 10%;">
78
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
79
+ </colgroup>
80
+ <thead>
81
+ <tr>
82
+ <th>Benchmark</th>
83
+ <th>Llama3-8B-1.58-100B-tokens</th>
84
+ <th>Falcon3-7B-Instruct-1.58bit </th>
85
+ </tr>
86
+ </thead>
87
+ <tbody>
88
+ <tr>
89
+ <td>IFEval</td>
90
+ <td>17.91</td>
91
+ <td><b>27.49</b></td>
92
+ </tr>
93
+ <tr>
94
+ <td>MUSR</td>
95
+ <td><b>4.87</b></td>
96
+ <td>4.64</td>
97
+ </tr>
98
+ <tr>
99
+ <td>GPQA</td>
100
+ <td><b>1.83<b></td>
101
+ <td>0.00</td>
102
+ </tr>
103
+ <tr>
104
+ <td>BBH</td>
105
+ <td>5.36</td>
106
+ <td><b>2.97</b></td>
107
+ </tr>
108
+ <tr>
109
+ <td>MMLU-PRO</td>
110
+ <td><b>2.78<b></td>
111
+ <td><b>1.47</b></td>
112
+ </tr>
113
+ <tr>
114
+ <td>MATH</td>
115
+ <td>0.26</td>
116
+ <td><b>0.43</b></td>
117
+ </tr>
118
+ <tr>
119
+ <td>Average</td>
120
+ <td>5.5</td>
121
+ <td><b>6.17</b></td>
122
+ </tr>
123
+ </tbody>
124
+ </table>
125
 
126
  # Citation
127