Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -11,14 +11,14 @@ pinned: false
|
|
11 |
</p>
|
12 |
|
13 |
<p>This organization is dedicated to language models for code generation. In particular CodeParrot is a GPT-2 model trained to generate Python code.</p>
|
14 |
-
<b
|
15 |
<br>
|
16 |
|
17 |
<ul>
|
18 |
|
19 |
<li>
|
20 |
<p>
|
21 |
-
Interactive blog
|
22 |
href="https://huggingface.co/spaces/loubnabnl/code-generation-models"
|
23 |
class="underline">Code generation with 🤗</a
|
24 |
>
|
@@ -27,14 +27,13 @@ pinned: false
|
|
27 |
<br>
|
28 |
<li>
|
29 |
<p>
|
30 |
-
Spaces
|
31 |
</p>
|
32 |
</li>
|
33 |
<br>
|
34 |
-
<li>Models
|
35 |
<br>
|
36 |
-
<li>Datasets
|
37 |
-
<li><a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean" class="underline">codeparrot-clean</a>, dataset on which we trained and evaluated CodeParrot, the splits are available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-train" class="underline">codeparrot-clean-train</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid" class="underline">codeparrot-clean-valid</a>.</li>
|
38 |
<li>A more filtered version of codeparrot-clean under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-more-filtering" class="underline">codeparrot-train-more-filtering</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-more-filtering" class="underline">codeparrot-train-more-filtering</a>.</li>
|
39 |
<li>CodeParrot dataset after near deduplication since initially only exact match deduplication was performed, it's available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-near-deduplication" class="underline">codeparrot-train-near-deduplication</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-near-deduplication" class="underline">codeparrot-train-near-deduplication</a>.</li>
|
40 |
<li><a href="https://huggingface.co/datasets/codeparrot/github-code" class="underline">GitHub-Code</a>, a 1TB dataset of 32 programming languages with 60 from GitHub files.</li>
|
|
|
11 |
</p>
|
12 |
|
13 |
<p>This organization is dedicated to language models for code generation. In particular CodeParrot is a GPT-2 model trained to generate Python code.</p>
|
14 |
+
<b>Table of contents:</b>
|
15 |
<br>
|
16 |
|
17 |
<ul>
|
18 |
|
19 |
<li>
|
20 |
<p>
|
21 |
+
<b>Interactive blog:</b> where we compare different code models and explain how they are trained and evaluated <a
|
22 |
href="https://huggingface.co/spaces/loubnabnl/code-generation-models"
|
23 |
class="underline">Code generation with 🤗</a
|
24 |
>
|
|
|
27 |
<br>
|
28 |
<li>
|
29 |
<p>
|
30 |
+
<b>Spaces:</b> code generation with: <a ref="https://huggingface.co/codeparrot/codeparrot" class="underline">CodeParrot (1.5B)</a>, <a href="https://huggingface.co/facebook/incoder-6B" class="underline">InCoder</a> (6B) and <a href="https://github.com/salesforce/CodeGen" class="underline">CodeGen</a> (6B)
|
31 |
</p>
|
32 |
</li>
|
33 |
<br>
|
34 |
+
<li><b>Models:</b> CodeParrot (1.5B) and CodeParrot-small (110M), each repo has different ongoing experiments in the branches.</li>
|
35 |
<br>
|
36 |
+
<li><b>Datasets:</b><ul><li><a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean" class="underline">codeparrot-clean</a>, dataset on which we trained and evaluated CodeParrot, the splits are available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-train" class="underline">codeparrot-clean-train</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid" class="underline">codeparrot-clean-valid</a>.</li>
|
|
|
37 |
<li>A more filtered version of codeparrot-clean under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-more-filtering" class="underline">codeparrot-train-more-filtering</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-more-filtering" class="underline">codeparrot-train-more-filtering</a>.</li>
|
38 |
<li>CodeParrot dataset after near deduplication since initially only exact match deduplication was performed, it's available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-near-deduplication" class="underline">codeparrot-train-near-deduplication</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-near-deduplication" class="underline">codeparrot-train-near-deduplication</a>.</li>
|
39 |
<li><a href="https://huggingface.co/datasets/codeparrot/github-code" class="underline">GitHub-Code</a>, a 1TB dataset of 32 programming languages with 60 from GitHub files.</li>
|