IMProofBench
Informal Mathematical Proof Benchmark
IMProofBench evaluates the ability of AI systems to create research-level mathematical proofs. We maintain a curated, private repository of PhD-level problems across pure mathematics to measure genuine mathematical reasoning capabilities while preventing data contamination and benchmark overfitting.
Benchmark Status
147
Draft30
Under Review42
Accepted39
GradedQuestion Submission Pipeline
156
Participants
10
Models
Questions
Create and review mathematical proof problems to test frontier AI models.
Community
Connect with mathematical researchers and track contributions.
Dashboard
Real-time statistics and benchmark performance metrics.
Percentage of questions with complete and correct solution