Create README.md

# grammarBERT

This repository contains the implementation and fine-tuning of the `codeBERT` model on derivation sequences, resulting in the `grammarBERT` model. This work leverages `codeBERT`’s strengths in both natural language processing (NLP) and code token prediction for a specialized application in derivation sequence representation and retrieval, essential for constructing a hybrid code and grammar database.

## Overview

`grammarBERT` fine-tunes the `codeBERT` model using a Masked Language Modeling (MLM) task on derivation sequences. By doing so, the model combines `codeBERT`’s expertise in both natural language and code token tasks to create a more specialized model capable of effectively representing and retrieving derivation sequences. This has applications in grammar-based programming tasks, improving both parsing accuracy and downstream model applications.

Files changed (1) hide show

README.md +6 -0

README.md ADDED Viewed

	@@ -0,0 +1,6 @@

+---
+datasets:
+- bigcode/the-stack-v2
+base_model:
+- microsoft/codebert-base
+---