ssmits
/

Falcon2-5.5B-Italian

Text Generation

text-generation-inference

Model card Files Files and versions

ssmits commited on Jun 5, 2024

Commit

f18c9cb

·

verified ·

1 Parent(s): fc30d84

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -12,9 +12,9 @@ language:
 ---
 ## Why prune?
-Falcon-11B is still undertrained, as can be seen by this graph:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/QeaL9bOrPskustzFpjMUP.png)
-This is why the choice is made by prune 50% of the layers.
 Note that \~1B of continued pre-training (\~1M rows of 1k tokens) is still required to restore the perplexity of this model in the desired language.
 I'm planning on doing that for certain languages, depending on how much compute will be available.

 ---
 ## Why prune?
+Even though [Falcon-11B](https://huggingface.co/tiiuae/falcon-11B) is trained on 5T tokens, it is still undertrained, as can be seen by this graph:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/QeaL9bOrPskustzFpjMUP.png)
+This is why the choice is made to prune 50% of the layers.
 Note that \~1B of continued pre-training (\~1M rows of 1k tokens) is still required to restore the perplexity of this model in the desired language.
 I'm planning on doing that for certain languages, depending on how much compute will be available.