Problem Guidelines

Creating high-quality benchmark problems for mathematical AI evaluation

Quick Start

Effective benchmark problems require PhD-level difficulty, genuine mathematical insight, and 2-3 auto-gradable subquestions. Think about recent calculations from your research that required a clever insight or non-obvious proof techniques.

Required Characteristics

  • PhD-level difficulty: Suitable for qualifying exams, research papers, or advanced seminars
  • Requires genuine insight: Not solvable by routine application of known algorithms
  • Clear proof-based main question: Answer should be a complete mathematical argument, not just a number
  • 2-3 unique-answer subquestions: Enable automated evaluation (e.g., "Is the statement true for n=5?", "What is the rank of this group?")

What to Avoid

  • Problems solvable by pattern matching or lucky guessing
  • Standard textbook exercises (even from graduate texts)
  • Purely computational problems that Mathematica/SageMath can solve directly
  • Problems without clear subquestions for automated evaluation

Problem Templates

Intersection Theory
Main: Let $X$ be [variety]. Compute the class of [specific cycle] in the Chow ring $A^*(X)$.
Subquestions: What is the degree of this class? Does it vanish in $A^2(X)$?
Main: For the moduli space $M$ of [objects], compute a closed formula for the intersection number $\int_M \alpha_1 \cup \alpha_2 \cup \ldots \cup \alpha_n$.
Subquestions: What is this number for specific parameter values?
Classification Problems
Main: Classify all [objects] with [property]. Give explicit representatives for each isomorphism class.
Subquestions: How many classes are there? Which have additional property $P$?
Main: What is the rank of the cohomology group $H^k(M)$ for [variety/moduli space $M$]?
Subquestions: What is $\dim H^0(M)$? Is $H^n(M) = 0$ for $n > d$?

Example Problems

Example 1: Stable Graphs

Main question: Find a closed formula for the number $N(g)$ of stable graphs of genus $g$ with no legs and precisely $3$ edges, for all $g \geq 2$.

Subquestions:

  • What is $N(3)$?
  • What is $N(8)$?
  • What is $N(1000)$?
Example 2: Permutation Representations

Main question: Let $G$ be a finite group. Is the functor $\mathrm{Perm}: G\text{-sets} \to \mathrm{Rep}_{\mathbb{C}}(G)$ sending $X$ to its permutation representation fully faithful? Prove or provide a counterexample.

Subquestions:

  • Is the statement true for all finite groups?
  • Is the statement true for all finite cyclic groups?
  • Is the statement true for all finite abelian groups?

Brainstorming Tips

A tricky calculation from your recent work that required a clever insight
An "obvious" statement that actually needs a non-trivial proof
A self-contained lemma that came up in a research project
An oral exam question for an advanced course

Ready to Contribute?

Start creating your problem using our editor with LaTeX support and AI testing.

Create New Problem