Files changed (1) hide show
  1. README.md +23 -2
README.md CHANGED
@@ -81,5 +81,26 @@ print(outputs[0]["generated_text"][len(prompt):])
81
 
82
  | Model | Code Generation | Code Execution |Test Output Prediction |
83
  |---------------------------|-----------------|----------------|-----------------------|
84
- | **Dracarys-72B-Instruct** | 33.86 | 54.30 | 53.26 |
85
- | Qwen2-72B-Instruct | 30.10 | TBD | TBD |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
  | Model | Code Generation | Code Execution |Test Output Prediction |
83
  |---------------------------|-----------------|----------------|-----------------------|
84
+ | **Dracarys-72B-Instruct** | **33.57** | **62.96** | **58.93** |
85
+ | Qwen2-72B-Instruct | 32.92 | 58.95 | 55.88 |
86
+
87
+ ## Breakdown of LiveCodeBench CodeGeneration
88
+
89
+ | Model | Easy | Medium | Hard |
90
+ |---------------------------|-----------------|----------------|-----------------------|
91
+ | **Dracarys-72B-Instruct** | 64.16 | **25.06** | **3.64** |
92
+ | Qwen2-72B-Instruct | 65.83 | 22.28 | 3.11 |
93
+
94
+ ## Breakdown of LiveCodeBench TestOutputPrediction
95
+
96
+ | Model | Easy | Medium | Hard |
97
+ |---------------------------|-----------------|----------------|-----------------------|
98
+ | **Dracarys-72B-Instruct** | **65.37** | **58.74** | **46.38** |
99
+ | Qwen2-72B-Instruct | 63.19 | 54.08 | 46.52 |
100
+
101
+ ## LiveBench
102
+
103
+ | Model | Global Average | Coding Average | Language Average | Mathematics Average | Data Analysis Average | Reasoning Average | IF Average |
104
+ |---------------------------|----------------|----------------|------------------|---------------------|-----------------------|------------------|-------------|
105
+ | **Dracarys-72B-Instruct** | **41.20** | **38.95** | **31.17** | 42.77 | 26.24 | 40 | 68.08 |
106
+ | Qwen2-72B-Instruct | 40.15 | 32.38 | 29.21 | 43.44 | 26.24 | 41.33 | 68.27 |