Create README.md

9cb19fb verified 3 months ago

3.82 kB

	---
	pipeline_tag: text-generation
	license: other
	language:
	- en
	tags:
	- math
	datasets:
	- internlm/Lean-Workbook
	- internlm/Lean-Github
	---

	# InternLM2.5-Step-Prover

	<div align="center">

	<img src="https://raw.githubusercontent.com/InternLM/InternLM/main/assets/logo.svg" width="200"/>
	<div> </div>
	<div align="center">
	<b><font size="5">InternLM-Math</font></b>
	<sup>
	<a href="https://internlm.intern-ai.org.cn/">
	<i><font size="4">HOT</font></i>
	</a>
	</sup>
	<div> </div>
	</div>

	A state-of-the-art LEAN4 step prover.

	[💻 Github](https://github.com/InternLM/InternLM-Math) [📊Dataset](https://huggingface.co/datasets/internlm/Lean-Github) [📖 Paper](https://arxiv.org/abs/2410.15700)
	</div>

	InternLM2.5-Step-Prover is a 7B language model which achieves state-of-the-art performances on MiniF2F, ProofNet, and Putnam math benchmarks, showing its formal math proving ability in multiple domains.

	# Dialogue Example
	```
	### Input template
	f"---\nNAME: {theorem.full_name}\n\n"
	f"---\nPROOF_BEFORE: {proof_before}\n\n"
	f"---\nSTATE_BEFORE: {state}\n\n"
	f"---\nTACTIC: "
	### Input example
	---
	NAME: square_sub_one_divisible_eight
	---
	PROOF_BEFORE: rw [h, pow_two]
	---
	STATE_BEFORE: m n : N
	h : n = 2 * m + 1
	⊢ 8 \| (2 * m + 1) * (2 * m + 1) - 1
	---
	TACTIC:
	### Output example
	rw [← Nat.mod_add_div (2 * m + 1) 8]
	```
	If you want to use critic model, please refer critic's model page.

	# Performance

	## MiniF2F
	\| Method \| Model size \| Pass \| miniF2F-valid \| miniF2F-test \|
	\|--------\|------------\|------\|---------------\|--------------\|
	\| Whole-Proof Generation Methods \|
	\| GPT-4-turbo 0409 \| - \| 64 \| 25.4% \| 23.0% \|
	\| DeepSeekMath-Base \| 7B \| 128 \| 25.4% \| 27.5% \|
	\| DeepSeek-Prover \| 7B \| 1 \| - \| 30.0% \|
	\| \| \| 64 \| - \| 46.3% \|
	\| \| \| 128 \| - \| 46.3% \|
	\| \| \| 8192 \| - \| 48.8% \|
	\| \| \| 65536 \| - \| 50.0% \|
	\| \| \| cumulative \| 60.2% \| 52.0% \|
	\| DeepSeek-Prover-1.5 \| 7B \| 32 \| - \| 63.5% \|
	\| TheoremLlama \| - \| cumulative \| 36.5% \| 33.6% \|
	\| Tree Search Methods \|
	\| COPRA (GPT-3.5) \| - \| 1 \| - \| 9.0% \|
	\| COPRA (GPT-4) \| - \| 1 \| - \| 26.6% \|
	\| DSP(Isabelle) \| 540B \| 100 \| 42.6% \| 38.9% \|
	\| Proof Artifact Co-Training \| 837M \| 1 \| 23.9% \| 24.6% \|
	\| \| \| 8 \| 29.3% \| 29.2% \|
	\| ReProver \| 229M \| 1 \| - \| 25.0% \|
	\| Llemma \| 7B \| 1 \| 26.2% \| 26.2% \|
	\| Llemma \| 34B \| 1 \| 27.9% \| 25.8% \|
	\| Curriculum Learning \| 837M \| 1 \| 33.6% \| 29.6% \|
	\| \| \| 8 \| 41.2% \| 34.5% \|
	\| \| \| 64 \| 47.3% \| 36.6% \|
	\| Hypertree Proof Search \| 600M \| cumulative \| 58.6% \| - \|
	\| \| \| 64 \| - \| 41.0% \|
	\| Lean-STaR \| 7B \| 64 \| - \| 46.3% \|
	\| InternLM2-Math \| 7B \| 1 \| 29.9% \| 30.3% \|
	\| InternLM2-Math-Plus \| 7B \| 1 \| - \| 43.4% \|
	\| InternLM2-Step-Prover \| 7B \| 1 \| 59.8% \| 48.8% \|
	\| InternLM2.5-Step-Prover \| 7B \| 1 \| 55.4% \| 47.3% \|
	\| InternLM2.5-Step-Prover+Critic \| 7B \| 256 \| 69.6% \| 65.9% \|


	## Proofnet & Putnam
	\| Method \| Model size \| Pass \| result \|
	\|--------\|------------\|------\|--------\|
	\| ProofNet benchmark \|
	\| ReProver \| 229M \| 1 \| 13.8% \|
	\| InternLM2-Step-Prover \| 7B \| 1 \| 18.1% \|
	\| InternLM2.5-Step-Prover \| 7B \| 256 \| 27.0% \|
	\| Putnam benchmark \|
	\| GPT-4 \| - \| 10 \| 1/640 \|
	\| COPRA (GPT-4) \| - \| 10 \| 1/640 \|
	\| DSP(Isabelle) \| 540B \| 10 \| 4/640 \|
	\| ReProver \| 229M \| 1 \| 0/640 \|
	\| InternLM2-Step-Prover \| 7B \| 1 \| 5/640 \|
	\| InternLM2.5-Step-Prover \| 7B \| 1 \| 6/640 \|


	# Citation and Tech Report
	```
	@misc{wu2024internlm25stepproveradvancingautomatedtheorem,
	title={InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems},
	author={Zijian Wu and Suozhi Huang and Zhejian Zhou and Huaiyuan Ying and Jiayu Wang and Dahua Lin and Kai Chen},
	year={2024},
	eprint={2410.15700},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2410.15700},
	}
	```