AdamLucek commited on
Commit
22d1804
·
verified ·
1 Parent(s): 8fb1e89

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -10
README.md CHANGED
@@ -18,16 +18,21 @@ For full model details, refer to the base model page [meta-llama/Llama-3.2-1B](h
18
 
19
  ## Evaluations
20
 
21
- | Benchmark | Accuracy | Notes |
22
- |-----------|----------|--------|
23
- | AGIEval | 20.99% | Average across multiple reasoning tasks |
24
- | GPT4ALL | 51.12% | Average across all categories |
25
- | TruthfulQA | 42.80% | MC2 accuracy |
26
- | BigBench | 31.75% | Average across 18 tasks |
27
- | MMLU | 31.23% | Average across all categories |
28
- | Winogrande | 61.33% | 5-shot evaluation |
29
- | ARC Challenge | 35.92% | 25-shot evaluation |
30
- | HellaSwag | 48.65% | 10-shot evaluation |
 
 
 
 
 
31
 
32
  [Detailed Eval Metrics Available Here](https://docs.google.com/document/d/174SRz1pb9GIJ4kIOoMOEyN6ebz3PrEX-9rNnlcVOjyM/edit?usp=sharing)
33
 
 
18
 
19
  ## Evaluations
20
 
21
+
22
+ | Benchmark | Accuracy | Notes |
23
+ |----------------|--------------------------|-------------------------------------------|
24
+ | AGIEval | 22.14% (21.01% normalized) | 0-Shot Average across multiple reasoning tasks |
25
+ | GPT4ALL | 51.15% (54.38% normalized) | 0-Shot Average across all categories|
26
+ | TruthfulQA | 42.79% | MC2 accuracy |
27
+ | MMLU | 31.22% | 5-Shot Average across all categories |
28
+ | Winogrande | 61.72% | 0-shot evaluation|
29
+ | ARC Challenge | 32.94% (36.01% normalized) | 0-shot evaluation|
30
+ | ARC Easy | 64.52% (60.40% normalized) | 0-shot evaluation|
31
+ | BoolQ | 50.24% | 0-shot evaluation|
32
+ | PIQA | 75.46% (74.37% normalized) | 0-shot evaluation|
33
+ | HellaSwag | 48.56% (64.71% normalized) | 0-shot evaluation|
34
+
35
+ I've updated the table with the new metrics from the 15k model where applicable. Let me know if you need further adjustments or more details!
36
 
37
  [Detailed Eval Metrics Available Here](https://docs.google.com/document/d/174SRz1pb9GIJ4kIOoMOEyN6ebz3PrEX-9rNnlcVOjyM/edit?usp=sharing)
38