csabakecskemeti commited on
Commit
d7a761a
1 Parent(s): 6214cf9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -1
README.md CHANGED
@@ -6,4 +6,108 @@ pipeline_tag: text-generation
6
  library_name: transformers
7
  base_model:
8
  - DevQuasar/analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  library_name: transformers
7
  base_model:
8
  - DevQuasar/analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit
9
+ model-index:
10
+ - name: analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit
11
+ results:
12
+ - task:
13
+ type: text-generation
14
+ dataset:
15
+ type: lm-evaluation-harness
16
+ name: bbh
17
+ metrics:
18
+ - name: acc_norm
19
+ type: acc_norm
20
+ value: 0.4168
21
+ verified: false
22
+ - task:
23
+ type: text-generation
24
+ dataset:
25
+ type: lm-evaluation-harness
26
+ name: gpqa
27
+ metrics:
28
+ - name: acc_norm
29
+ type: acc_norm
30
+ value: 0.2691
31
+ verified: false
32
+ - task:
33
+ type: text-generation
34
+ dataset:
35
+ type: lm-evaluation-harness
36
+ name: math
37
+ metrics:
38
+ - name: exact_match
39
+ type: exact_match
40
+ value: 0.0867
41
+ verified: false
42
+ - task:
43
+ type: text-generation
44
+ dataset:
45
+ type: lm-evaluation-harness
46
+ name: mmlu
47
+ metrics:
48
+ - name: acc_norm
49
+ type: acc_norm
50
+ value: 0.2822
51
+ verified: false
52
+ - task:
53
+ type: text-generation
54
+ dataset:
55
+ type: lm-evaluation-harness
56
+ name: musr
57
+ metrics:
58
+ - name: acc_norm
59
+ type: acc_norm
60
+ value: 0.3648
61
+ verified: false
62
+ - task:
63
+ type: text-generation
64
+ dataset:
65
+ type: lm-evaluation-harness
66
+ name: hellaswag
67
+ metrics:
68
+ - name: acc
69
+ type: acc
70
+ value: 0.5141
71
+ verified: false
72
+ - name: acc_norm
73
+ type: acc_norm
74
+ value: 0.6793
75
+ verified: false
76
+
77
+ ---
78
+
79
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e6d37e02dee9bcb9d9fa18/X4WG8AnMFqJuWkRvA0CrW.png)
80
+
81
+ ### eval
82
+
83
+ The fine tuned model (DevQuasar/analytical_reasoning_r16a32_unsloth-Llama-3.2-3B-Instruct-bnb-4bit)
84
+ has gained performace over the base model (unsloth/Llama-3.2-3B-Instruct-bnb-4bit)
85
+ in the following tasks.
86
+
87
+ | Test | Base Model | Fine-Tuned Model | Performance Gain |
88
+ |---|---|---|---|
89
+ | leaderboard_bbh_logical_deduction_seven_objects | 0.2520 | 0.4360 | 0.1840 |
90
+ | leaderboard_bbh_logical_deduction_five_objects | 0.3560 | 0.4560 | 0.1000 |
91
+ | leaderboard_musr_team_allocation | 0.2200 | 0.3200 | 0.1000 |
92
+ | leaderboard_bbh_disambiguation_qa | 0.3040 | 0.3760 | 0.0720 |
93
+ | leaderboard_gpqa_diamond | 0.2222 | 0.2727 | 0.0505 |
94
+ | leaderboard_bbh_movie_recommendation | 0.5960 | 0.6360 | 0.0400 |
95
+ | leaderboard_bbh_formal_fallacies | 0.5080 | 0.5400 | 0.0320 |
96
+ | leaderboard_bbh_tracking_shuffled_objects_three_objects | 0.3160 | 0.3440 | 0.0280 |
97
+ | leaderboard_bbh_causal_judgement | 0.5455 | 0.5668 | 0.0214 |
98
+ | leaderboard_bbh_web_of_lies | 0.4960 | 0.5160 | 0.0200 |
99
+ | leaderboard_math_geometry_hard | 0.0455 | 0.0606 | 0.0152 |
100
+ | leaderboard_math_num_theory_hard | 0.0519 | 0.0649 | 0.0130 |
101
+ | leaderboard_musr_murder_mysteries | 0.5280 | 0.5400 | 0.0120 |
102
+ | leaderboard_gpqa_extended | 0.2711 | 0.2802 | 0.0092 |
103
+ | leaderboard_bbh_sports_understanding | 0.5960 | 0.6040 | 0.0080 |
104
+ | leaderboard_math_intermediate_algebra_hard | 0.0107 | 0.0143 | 0.0036 |
105
+
106
+
107
+ ### Framework versions
108
+
109
+ - unsloth 2024.11.5
110
+ - trl 0.12.0
111
+
112
+ ### Training HW
113
+ - V100