Create README.md
Browse files# grammarBERT
This repository contains the implementation and fine-tuning of the `codeBERT` model on derivation sequences, resulting in the `grammarBERT` model. This work leverages `codeBERT`’s strengths in both natural language processing (NLP) and code token prediction for a specialized application in derivation sequence representation and retrieval, essential for constructing a hybrid code and grammar database.
## Overview
`grammarBERT` fine-tunes the `codeBERT` model using a Masked Language Modeling (MLM) task on derivation sequences. By doing so, the model combines `codeBERT`’s expertise in both natural language and code token tasks to create a more specialized model capable of effectively representing and retrieving derivation sequences. This has applications in grammar-based programming tasks, improving both parsing accuracy and downstream model applications.