loubnabnl HF staff commited on
Commit
2ae6447
·
1 Parent(s): 0f3bc73

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -11,14 +11,14 @@ pinned: false
11
  </p>
12
 
13
  <p>This organization is dedicated to language models for code generation. In particular CodeParrot is a GPT-2 model trained to generate Python code.</p>
14
- <b><h2>Table of contents:</h2></b>
15
  <br>
16
 
17
  <ul>
18
 
19
  <li>
20
  <p>
21
- Interactive blog: where we compare different code models and explain how they are trained and evaluated <a
22
  href="https://huggingface.co/spaces/loubnabnl/code-generation-models"
23
  class="underline">Code generation with 🤗</a
24
  >
@@ -27,14 +27,13 @@ pinned: false
27
  <br>
28
  <li>
29
  <p>
30
- Spaces: code generation with: <a ref="https://huggingface.co/codeparrot/codeparrot" class="underline">CodeParrot (1.5B)</a>, <a href="https://huggingface.co/facebook/incoder-6B" class="underline">InCoder</a> (6B) and <a href="https://github.com/salesforce/CodeGen" class="underline">CodeGen</a> (6B)
31
  </p>
32
  </li>
33
  <br>
34
- <li>Models: CodeParrot (1.5B) and CodeParrot-small (110M), each repo has different ongoing experiments in the branches.</li>
35
  <br>
36
- <li>Datasets:<ul>
37
- <li><a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean" class="underline">codeparrot-clean</a>, dataset on which we trained and evaluated CodeParrot, the splits are available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-train" class="underline">codeparrot-clean-train</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid" class="underline">codeparrot-clean-valid</a>.</li>
38
  <li>A more filtered version of codeparrot-clean under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-more-filtering" class="underline">codeparrot-train-more-filtering</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-more-filtering" class="underline">codeparrot-train-more-filtering</a>.</li>
39
  <li>CodeParrot dataset after near deduplication since initially only exact match deduplication was performed, it's available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-near-deduplication" class="underline">codeparrot-train-near-deduplication</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-near-deduplication" class="underline">codeparrot-train-near-deduplication</a>.</li>
40
  <li><a href="https://huggingface.co/datasets/codeparrot/github-code" class="underline">GitHub-Code</a>, a 1TB dataset of 32 programming languages with 60 from GitHub files.</li>
 
11
  </p>
12
 
13
  <p>This organization is dedicated to language models for code generation. In particular CodeParrot is a GPT-2 model trained to generate Python code.</p>
14
+ <b>Table of contents:</b>
15
  <br>
16
 
17
  <ul>
18
 
19
  <li>
20
  <p>
21
+ <b>Interactive blog:</b> where we compare different code models and explain how they are trained and evaluated <a
22
  href="https://huggingface.co/spaces/loubnabnl/code-generation-models"
23
  class="underline">Code generation with 🤗</a
24
  >
 
27
  <br>
28
  <li>
29
  <p>
30
+ <b>Spaces:</b> code generation with: <a ref="https://huggingface.co/codeparrot/codeparrot" class="underline">CodeParrot (1.5B)</a>, <a href="https://huggingface.co/facebook/incoder-6B" class="underline">InCoder</a> (6B) and <a href="https://github.com/salesforce/CodeGen" class="underline">CodeGen</a> (6B)
31
  </p>
32
  </li>
33
  <br>
34
+ <li><b>Models:</b> CodeParrot (1.5B) and CodeParrot-small (110M), each repo has different ongoing experiments in the branches.</li>
35
  <br>
36
+ <li><b>Datasets:</b><ul><li><a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean" class="underline">codeparrot-clean</a>, dataset on which we trained and evaluated CodeParrot, the splits are available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-train" class="underline">codeparrot-clean-train</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid" class="underline">codeparrot-clean-valid</a>.</li>
 
37
  <li>A more filtered version of codeparrot-clean under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-more-filtering" class="underline">codeparrot-train-more-filtering</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-more-filtering" class="underline">codeparrot-train-more-filtering</a>.</li>
38
  <li>CodeParrot dataset after near deduplication since initially only exact match deduplication was performed, it's available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-near-deduplication" class="underline">codeparrot-train-near-deduplication</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-near-deduplication" class="underline">codeparrot-train-near-deduplication</a>.</li>
39
  <li><a href="https://huggingface.co/datasets/codeparrot/github-code" class="underline">GitHub-Code</a>, a 1TB dataset of 32 programming languages with 60 from GitHub files.</li>