Spaces:

codeparrot
/

README

Running

App Files Files Community

README / README.md

loubnabnl HF staff

Update README.md

6d0e573 over 2 years ago

preview code

raw

history blame

2.67 kB

	---
	title: README
	emoji: 👀
	colorFrom: yellow
	colorTo: purple
	sdk: static
	pinned: false
	---
	<p>
	<img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/codeparrot_logo.png" alt="drawing" width="440"/>
	</p>

	<p>This organization is dedicated to language models for code generation. In particular CodeParrot is a GPT-2 model trained to generate Python code.</p>
	<h2>Table of contents:</h2>
	<ul>

	<li>
	<p>
	Interactive blog where we compare different code models and explain how they are trained and evaluated: <a
	href="https://huggingface.co/spaces/loubnabnl/code-generation-models"
	class="underline">Code generation with 🤗</a
	>
	</p>
	</li>

	<li>
	<p>
	Spaces: code generation with: <a href="https://huggingface.co/codeparrot/codeparrot">CodeParrot</a> (1.5B), <a href="https://huggingface.co/facebook/incoder-6B">InCoder</a> (6B) and <a href="https://github.com/salesforce/CodeGen">CodeGen</a> (6B)
	</p>
	</li>

	<li>Models: CodeParrot (1.5B) and CodeParrot-small (110M), each repo has different ongoing experiments in the branches.</li>

	<li>Datasets:<ul>
	<li><a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean">codeparrot-clean</a>, dataset on which we trained and evaluated CodeParrot, the splits are available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-train">codeparrot-clean-train</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid">codeparrot-clean-valid</a>.</li>
	<li>A more filtered version of codeparrot-clean under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-more-filtering">codeparrot-train-more-filtering</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-more-filtering">codeparrot-train-more-filtering</a>.</li>
	<li>CodeParrot dataset after near deduplication since initially only exact match deduplication was performed, it's available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-near-deduplication">codeparrot-train-near-deduplication</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-near-deduplication">codeparrot-train-near-deduplication</a>.</li>
	<li><a href="https://huggingface.co/datasets/codeparrot/github-code">GitHub-Code</a>, a 1TB dataset of 32 programming languages with 60 from GitHub files.</li>
	<li><a href="https://huggingface.co/datasets/codeparrot/github-jupyter">GitHub-Jupyter</a>, a 16.3GB dataset of Jupyter Notebooks from BigQuery GitHub.</li>
	<li><a href="https://huggingface.co/datasets/codeparrot/apps">APPS</a>, a benchmark for code generation with 10000 problems.</li>
	</ul>
	</li>
	</ul>