IMProofBench
Informal Mathematical Proof Benchmark
IMProofBench evaluates the ability of AI systems to create research-level mathematical proofs. We maintain a curated, private repository of PhD-level problems across pure mathematics to measure genuine mathematical reasoning capabilities while preventing data contamination and benchmark overfitting.
Model Performance
Top models by percentage of benchmark questions with complete and correct solutions.
Become a Contributor!
Create a question and see what state-of-the-art models can do in your field of mathematics. If your question is included in the benchmark, receive co-authorship on future papers. You retain full rights to your question and can retract it at any time.
Create a Question