crumb
/

FLAN-OPT-1.3b-LoRA

Model card Files Files and versions Community

crumb commited on Mar 1, 2023

Commit

c8c2acf

•

1 Parent(s): 1a8a920

V2 update

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -13,11 +13,11 @@ tags:
 OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
-This model is [facebook/opt-6.7b](https://hf.co/facebook/opt-6.7b) finetuned with low-rank adapters (https://arxiv.org/abs/2106.09685) on the FLAN datasets (https://arxiv.org/pdf/2210.11416.pdf).
-Low-rank adapters (r=16) finetuned over 1.8m new tokens of a FLAN task mixture, with the start of each example cut off if it was too large to fit within a 256 token context.
-The model reaches a train ppl of 5.92 and an eval ppl of 5.24.
 ### Inference Example (Chain-of-Thought prompt):

 OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
+This model is [facebook/opt-1.3b](https://hf.co/facebook/opt-1.3b) finetuned with low-rank adapters (https://arxiv.org/abs/2106.09685) on the FLAN datasets (https://arxiv.org/pdf/2210.11416.pdf).
+Low-rank adapters (r=16) finetuned over 4.2m new tokens of a FLAN task mixture, with the start of each example cut off if it was too large to fit within a 256 token context.
+The model reaches a train ppl of 4.77 and an eval ppl of 4.19.
 ### Inference Example (Chain-of-Thought prompt):