PEFT
flan
opt
crumb commited on
Commit
c8c2acf
1 Parent(s): 1a8a920
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -13,11 +13,11 @@ tags:
13
 
14
  OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
15
 
16
- This model is [facebook/opt-6.7b](https://hf.co/facebook/opt-6.7b) finetuned with low-rank adapters (https://arxiv.org/abs/2106.09685) on the FLAN datasets (https://arxiv.org/pdf/2210.11416.pdf).
17
 
18
- Low-rank adapters (r=16) finetuned over 1.8m new tokens of a FLAN task mixture, with the start of each example cut off if it was too large to fit within a 256 token context.
19
 
20
- The model reaches a train ppl of 5.92 and an eval ppl of 5.24.
21
 
22
  ### Inference Example (Chain-of-Thought prompt):
23
 
 
13
 
14
  OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
15
 
16
+ This model is [facebook/opt-1.3b](https://hf.co/facebook/opt-1.3b) finetuned with low-rank adapters (https://arxiv.org/abs/2106.09685) on the FLAN datasets (https://arxiv.org/pdf/2210.11416.pdf).
17
 
18
+ Low-rank adapters (r=16) finetuned over 4.2m new tokens of a FLAN task mixture, with the start of each example cut off if it was too large to fit within a 256 token context.
19
 
20
+ The model reaches a train ppl of 4.77 and an eval ppl of 4.19.
21
 
22
  ### Inference Example (Chain-of-Thought prompt):
23