crumb commited on
Commit
c023524
·
1 Parent(s): 5c487bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -9,7 +9,7 @@ language:
9
 
10
  *by GPT-4 & Crumb*
11
 
12
- ***Note***: *this version of the model was not trained with a constant-length dataset. it is in the process of being retrained right now.*
13
 
14
  ### Introduction
15
 
@@ -30,12 +30,13 @@ A learning rate of 1e-4 was used in this study, with no learning rate schedule.
30
 
31
  [Sanh et al. (2019)](https://arxiv.org/abs/1910.01108) suggests a student around 40% of the size of it's teacher can achieve similar performance in encoder models when training from scratch with suprivision. We warm-start our model from a smaller checkpoint than the teacher that maintains a similar ratio with a student that is 43.75% the size of it's teacher.
32
 
33
- | model | piqa acc | winogrande acc | lambada ppl | lambada acc | arc acc | sciq acc | wsc acc |
34
- | --- | --- | --- | --- | --- | --- | --- | --- |
35
  | pythia-70m (student base) | 59.85 | 51.22 | 140.81 | 21.40 | 17.15 | 65.00 | 36.53 |
36
  | pythia-160m (teacher) | 62.68 | 51.07 | 30.03 | 36.76 | 19.62 | 76.20 | 36.58 |
37
- | --- | --- | --- | --- | --- | --- | --- | --- |
38
- | distilpythia (student) | 59.74 | **51.62** | 420.70 | 15.82 | **17.15** | 61.30 | **36.54** |
 
39
 
40
  <center> <i>Table 1.</i> The student before finetuning, teacher, and student after finetuning and their results on various benchmarks. Numbers in bold are where the student after finetuning matches or outperforms the student before finetuning. </center>
41
 
 
9
 
10
  *by GPT-4 & Crumb*
11
 
12
+ ***Note***: *this model is in the process of being re-evaluated because it was retrained.*
13
 
14
  ### Introduction
15
 
 
30
 
31
  [Sanh et al. (2019)](https://arxiv.org/abs/1910.01108) suggests a student around 40% of the size of it's teacher can achieve similar performance in encoder models when training from scratch with suprivision. We warm-start our model from a smaller checkpoint than the teacher that maintains a similar ratio with a student that is 43.75% the size of it's teacher.
32
 
33
+ | model | piqa acc | winogrande acc | lambada ppl | lambada acc | arc acc | sciq acc | wsc acc | notes |
34
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
35
  | pythia-70m (student base) | 59.85 | 51.22 | 140.81 | 21.40 | 17.15 | 65.00 | 36.53 |
36
  | pythia-160m (teacher) | 62.68 | 51.07 | 30.03 | 36.76 | 19.62 | 76.20 | 36.58 |
37
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
38
+ | distilpythia (student) | 59.74 | **51.62** | 420.70 | 15.82 | **17.15** | 61.30 | **36.54** | trained on padded/truncated examples
39
+ | distilpythia-cl (student) | 59.30 | 50.75 | 403.78 | 15.16 | 16.98 | 59.20 | **36.54** | trained on a constant-length dataset
40
 
41
  <center> <i>Table 1.</i> The student before finetuning, teacher, and student after finetuning and their results on various benchmarks. Numbers in bold are where the student after finetuning matches or outperforms the student before finetuning. </center>
42