Nbeau commited on
Commit
6adec45
1 Parent(s): 0fa71a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -4,3 +4,27 @@ datasets:
4
  base_model:
5
  - microsoft/codebert-base
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  base_model:
5
  - microsoft/codebert-base
6
  ---
7
+ # grammarBERT
8
+
9
+
10
+ `grammarBERT` fine-tunes the `codeBERT` model using a Masked Language Modeling (MLM) task on derivation sequences for Python version 3.8. By doing so, the model combines `codeBERT`’s expertise in both natural language and code token tasks to create a more specialized model capable of effectively representing and retrieving derivation sequences. This has applications in grammar-based programming tasks, improving both parsing accuracy and downstream model applications.
11
+
12
+
13
+ ## Usage
14
+
15
+ ```python
16
+ from transformers import RobertaForMaskedLM, RobertaTokenizer
17
+
18
+ # Load the pre-trained codeBERT model and tokenizer
19
+ model = RobertaForMaskedLM.from_pretrained("microsoft/codebert-base")
20
+ tokenizer = RobertaTokenizer.from_pretrained("microsoft/codebert-base")
21
+
22
+ # Example of tokenizing a code snippet
23
+ code_snippet = "def enumerate_items(items):"
24
+ derivation_sequence = ast2seq(code_snippet) # ast2seq implementation available https://github.com/NathanaelBeau/grammarBERT/
25
+ input_ids = tokenizer.encode(code_snippet, return_tensors='pt')
26
+
27
+ # Predict masked tokens or fine-tune the model as needed
28
+ outputs = model(input_ids)
29
+
30
+ ```