Spaces:

codeparrot
/

README

Running

loubnabnl HF staff commited on Aug 18, 2022

Commit

5d5b548

1 Parent(s): e3ffb64

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -51,10 +51,10 @@ pinned: false
 <li>5- <a href="https://huggingface.co/datasets/codeparrot/github-code" class="underline">GitHub-Code</a>, a 1TB dataset of 32 programming languages from GitHub files.</li>
 <li>6- <a href="https://huggingface.co/datasets/codeparrot/github-code-clean" class="underline">GitHub-Code-Clean</a>, a cleaner version of GitHub-Code dataset.</li>
 <li>7- <a href="https://huggingface.co/datasets/codeparrot/github-jupyter" class="underline">GitHub-Jupyter</a>, a 16.3GB dataset of Jupyter Notebooks  from BigQuery GitHub.</li>
-<li>8- <a href="https://huggingface.co/datasets/codeparrot/apps" class="underline">APPS</a>, a benchmark for code generation with 10000 problems.</li>
-<li>9- <a href="https://huggingface.co/datasets/codeparrot/codecomplex" class="underline">CodeComplex</a>, an annotated dataset of 4,200 Java codes and their time complexity.</li>
-<li>10- <a href="https://huggingface.co/datasets/codeparrot/xlcost-text-to-code" class="underline">XLCOST-text-to-code</a>, a subset of XLCoST benchmark, for text-to-code generation at snippet level and program level for 7 programming languages: Python, C, C#, C++, Java, Javascript and PHP.</li>
-<li>10- <a href="https://huggingface.co/datasets/codeparrot/github-jupyter-text-code-pairs" class="underline">github-jupyter-text-code-pairs</a>, a dataset of text and code pairs extracted from Jupyter notebooks.</li>
 </ul>
 </li>

 <li>5- <a href="https://huggingface.co/datasets/codeparrot/github-code" class="underline">GitHub-Code</a>, a 1TB dataset of 32 programming languages from GitHub files.</li>
 <li>6- <a href="https://huggingface.co/datasets/codeparrot/github-code-clean" class="underline">GitHub-Code-Clean</a>, a cleaner version of GitHub-Code dataset.</li>
 <li>7- <a href="https://huggingface.co/datasets/codeparrot/github-jupyter" class="underline">GitHub-Jupyter</a>, a 16.3GB dataset of Jupyter Notebooks  from BigQuery GitHub.</li>
+<li>8- <a href="https://huggingface.co/datasets/codeparrot/github-jupyter-text-code-pairs" class="underline">github-jupyter-text-code-pairs</a>, a dataset of text and code pairs extracted from Jupyter notebooks, it is a parsed version of github-jupyter dataset.</li>
+<li>9- <a href="https://huggingface.co/datasets/codeparrot/apps" class="underline">APPS</a>, a benchmark for code generation with 10000 problems.</li>
+<li>10- <a href="https://huggingface.co/datasets/codeparrot/codecomplex" class="underline">CodeComplex</a>, an annotated dataset of 4,200 Java codes and their time complexity.</li>
+<li>11- <a href="https://huggingface.co/datasets/codeparrot/xlcost-text-to-code" class="underline">XLCOST-text-to-code</a>, a subset of XLCoST benchmark, for text-to-code generation at snippet level and program level for 7 programming languages: Python, C, C#, C++, Java, Javascript and PHP.</li>
 </ul>
 </li>