Nbeau
/

grammarBERT

Nbeau commited on 12 days ago

Commit

6adec45

•

1 Parent(s): 0fa71a2

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -4,3 +4,27 @@ datasets:
 base_model:
 - microsoft/codebert-base
 ---

 base_model:
 - microsoft/codebert-base
 ---
+# grammarBERT
+`grammarBERT` fine-tunes the `codeBERT` model using a Masked Language Modeling (MLM) task on derivation sequences for Python version 3.8. By doing so, the model combines `codeBERT`’s expertise in both natural language and code token tasks to create a more specialized model capable of effectively representing and retrieving derivation sequences. This has applications in grammar-based programming tasks, improving both parsing accuracy and downstream model applications.
+## Usage
+```python
+from transformers import RobertaForMaskedLM, RobertaTokenizer
+# Load the pre-trained codeBERT model and tokenizer
+model = RobertaForMaskedLM.from_pretrained("microsoft/codebert-base")
+tokenizer = RobertaTokenizer.from_pretrained("microsoft/codebert-base")
+# Example of tokenizing a code snippet
+code_snippet = "def enumerate_items(items):"
+derivation_sequence = ast2seq(code_snippet) # ast2seq implementation available https://github.com/NathanaelBeau/grammarBERT/
+input_ids = tokenizer.encode(code_snippet, return_tensors='pt')
+# Predict masked tokens or fine-tune the model as needed
+outputs = model(input_ids)
+```