Update README.md
Browse files
README.md
CHANGED
@@ -65,12 +65,6 @@ Nearly every base model that isn't finetuned for a specific task was trained on
|
|
65 |
|
66 |
```
|
67 |
|
68 |
-
"Instruct" models have these special tokens:
|
69 |
-
|
70 |
-
```
|
71 |
-
<prompt> your prompt goes here <output> the model outputs a result here.
|
72 |
-
```
|
73 |
-
|
74 |
Some applications where I can imagine these being useful are: warm-starting very small encoder-decoder models, fitting a new scaling law that takes into account smaller models, or having a "fuzzy wrapper" around an API. They also could be usable on their own (for classification or other) when finetuned on more specific datasets. I don't expect the 3.3m models to be useful for any task whatsoever. Every model was trained on a singular GPU, either a RTX2060, RTX3060, or a T4.
|
75 |
|
76 |
I'd , uh , appreciate help in evaluating all these models probably with lm harness!!
|
|
|
65 |
|
66 |
```
|
67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
Some applications where I can imagine these being useful are: warm-starting very small encoder-decoder models, fitting a new scaling law that takes into account smaller models, or having a "fuzzy wrapper" around an API. They also could be usable on their own (for classification or other) when finetuned on more specific datasets. I don't expect the 3.3m models to be useful for any task whatsoever. Every model was trained on a singular GPU, either a RTX2060, RTX3060, or a T4.
|
69 |
|
70 |
I'd , uh , appreciate help in evaluating all these models probably with lm harness!!
|