--- license: bigcode-openrail-m pipeline_tag: text-generation library_name: peft base_model: mistralai/Mistral-7B-v0.1 widget: - example_title: "Proof: n = n + 0" text: "Lemma plus_n_O : forall n:nat, n = n + 0." - example_title: "Proof: 7 + 3 = 10" text: "Lemma ex1: 7 + 3 = 10." - example_title: "Wrong proposition" text: "Lemma mult_S : forall m n : nat, S (m * n) = m * n + n." --- # Model Card for CoqLLM-FineTuned-Experiment-Gen0 This model is an experiment in the realm of formal theorem proving, specifically tailored for generating and interpreting Coq code. By leveraging a comprehensive dataset derived from over 10,000 Coq source files, *CoqLLM-FineTuned-Experiment-Gen0* exhibits an enhanced proficiency in understanding the syntax and semantics unique to Coq, thereby facilitating significant strides in automated theorem proving. ## Model Details ### Model Description - **Developed by:** Andreas Florath - **Model type:** Fine-tuned Large Language Model - **Finetuned from model:** Mistral-7b (`mistralai/Mistral-7B-v0.1`) - **Language(s) (NLP):** Coq (only) - **License:** bigcode-openrail-m ## Model Sources - **Paper:** [Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code](https://arxiv.org/abs/2403.12627) - **Dataset:** [florath/coq-facts-props-proofs-gen0-v1](https://huggingface.co/datasets/florath/coq-facts-props-proofs-gen0-v1) ## Uses ### Prompt Format No special prompt format needed. The model was fine-tuned with Coq source code. Just providing the proposal let's the model generate a proof, like: ``` Lemma plus_n_O : forall n:nat, n = n + 0. ``` No special characters or delimiters are needed. ### Direct Use *CoqLLM-FineTuned-Experiment-Gen0* is an experiment to show the usefulness of the dataset used for fine-tuning the model. The model might be used to check if short proofs can be automatically generated. Another possible use-case is to curate the existing Coq source code and curate and generate new Coq source code. ### Out-of-Scope Use The model is not intended for general-purpose language tasks outside the domain of theorem proving and formal verification. Misuse includes but is not limited to non-Coq programming tasks, natural language processing outside technical documentation, or any form of deployment in critical systems without adequate supervision and validation. ### Bias, Risks, and Limitations The model inherits biases from its base mode, training data, potentially reflecting the diversity or lack thereof in the collected Coq files. Users should be wary of these limitations, particularly when applying the model in new or underserved areas of theorem proving. ### Recommendations To mitigate risks and biases, it's recommended to supplement model use with human oversight or an environment where the generated Coq source code can be automatically verified. Continuous monitoring for unexpected behaviors or outputs is advised, alongside efforts to diversify and expand the training dataset to cover a broader spectrum of Coq use cases. ## How to Get Started with the Model Here is a code snippet using the fine-tuned model. The shown setup should work using GPUs with <= 24GByte RAM. You might want to adapt and experiment with different settings, like different temperatures. ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig import sys from peft import PeftModel base_model_id = "mistralai/Mistral-7B-v0.1" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) base_model = AutoModelForCausalLM.from_pretrained( base_model_id, quantization_config=bnb_config, device_map="auto", trust_remote_code=True, ) eval_tokenizer = AutoTokenizer.from_pretrained( base_model_id, add_bos_token=True, trust_remote_code=True) ft_model = PeftModel.from_pretrained( base_model, "florath/CoqLLM-FineTuned-Experiment-Gen0") eval_prompt = "Lemma plus_n_O : forall n:nat, n = n + 0." model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda") ft_model.eval() with torch.no_grad(): for idx in range(10): res = eval_tokenizer.decode( ft_model.generate( **model_input, max_new_tokens=50, do_sample=True, temperature=0.4, pad_token_id=eval_tokenizer.eos_token_id, repetition_penalty=1.15)[0], skip_special_tokens=False) print("Result [%2d] [%s]" % (idx, res)) ``` ## Training Details The model was fine-tuned with the [florath/coq-facts-props-proofs-gen0-v1](https://huggingface.co/datasets/florath/coq-facts-props-proofs-gen0-v1) dataset. Only entries with permissive licenses were used during the fine-tuning process. ## Cite ``` @misc{florath2024enhancing, title={Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code}, author={Andreas Florath}, year={2024}, eprint={2403.12627}, archivePrefix={arXiv}, primaryClass={cs.AI} } ```