metadata

language:
  - zh
  - en
tags:
  - code
  - autocomplete
  - pytorch
  - en
license: apache-2.0

GPT2 for Code AutoComplete Model

code-autocomplete, a code completion plugin for Python.

code-autocomplete can Automatic completion of code line granularity and block granularity.

Usage

Open source repo：code-autocomplete，support GPT2 model, usage：

from autocomplete.gpt2 import Infer
m = Infer(model_name="gpt2", model_dir="shibing624/code-autocomplete-gpt2-base", use_cuda=False)
i = m.predict('import torch.nn as')
print(i)

Also, use huggingface/transformers：

Please use 'GPT2' related functions to load this model!

import os
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = GPT2Tokenizer.from_pretrained("shibing624/code-autocomplete-gpt2-base")
model = GPT2LMHeadModel.from_pretrained("shibing624/code-autocomplete-gpt2-base")
model.to(device)
prompts = [
    """from torch import nn
    class LSTM(Module):
        def __init__(self, *,
                     n_tokens: int,
                     embedding_size: int,
                     hidden_size: int,
                     n_layers: int):""",
    """import numpy as np
    import torch
    import torch.nn as""",
    "import java.util.ArrayList",
    "def factorial(n):",
]
for prompt in prompts:
    input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors='pt').to(device)
    outputs = model.generate(input_ids=input_ids,
                             max_length=64 + len(prompt),
                             temperature=1.0,
                             top_k=50,
                             top_p=0.95,
                             repetition_penalty=1.0,
                             do_sample=True,
                             num_return_sequences=1,
                             length_penalty=2.0,
                             early_stopping=True)
    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(decoded)
    print("=" * 20)

output: ```shell from torch import nn class LSTM(Module): def init(self, *, n_tokens: int, embedding_size: int, hidden_size: int, n_layers: int): self.embedding_size = embedding_size

import numpy as np import torch import torch.nn as np from onmt import nnumpy as np

class PredicterDNN(nn.Module): @classmethod @parameterized.expand([0.5, 2.5] + (10, 10)) @classmethod @static def add(self, sample_rate, max_iters=self.max_iters, mask_fre

import java.util.ArrayList[Tuple[Int]],

def factorial(n): number of elements per dimension, assert len(n) > 1 n.append(self.n_iters) n = n_iter(self.n_norm)

 def _score(

====================

Process finished with exit code 0


Model files：

code-autocomplete-gpt2-base ├── config.json ├── merges.txt ├── pytorch_model.bin ├── special_tokens_map.json ├── tokenizer_config.json └── vocab.json


### Train data
#### pytorch_awesome projects source code

download [code-autocomplete](https://github.com/shibing624/code-autocomplete),
```shell
cd autocomplete
python create_dataset.py

If you want train code-autocomplete GPT2 model，refer https://github.com/shibing624/code-autocomplete/blob/main/autocomplete/gpt2.py

About GPT2

Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large

Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in this paper and first released at this page.

Disclaimer: The team releasing GPT-2 also wrote a model card for their model. Content from this model card has been written by the Hugging Face team to complete the information they provided and give specific examples of bias.

Citation

@misc{code-autocomplete,
  author = {Xu Ming},
  title = {code-autocomplete: Code AutoComplete with GPT model},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/shibing624/code-autocomplete},
}