FormalMATH: A Large-Scale Benchmark for Formal Mathematical Reasoning in Lean4

Introduction

Dataset	FormalMATH
Modalities	Text
Formats	json
Languages	English, Chinese
Size	5.26MB
Release Date	2025-05-05
Domain	Mathematics
License	mit

FormalMATH is a large-scale benchmark designed to evaluate and advance the capabilities of Large Language Models in the challenging domain of formal mathematical reasoning. It contains 5,560 formally verified problems, all expressed as theorem statements in the Lean4 proof assistant.

The dataset is comprehensive in scope, featuring problems that range from high-school Olympiad challenges to undergraduate-level theorems. It covers diverse mathematical fields, including algebra, applied mathematics, calculus, number theory, and discrete mathematics. Each entry consists of a natural language problem paired with its corresponding formal Lean4 statement, along with solutions and other metadata.
Created using a novel human-in-the-loop autoformalization pipeline, FormalMATH serves as a robust and difficult testbed for current and future LLM-based theorem provers. Evaluations show that even state-of-the-art models struggle with this benchmark, highlighting significant room for improvement in automated reasoning.

About

Mission

Events

News

Opportunities

Partnerships

Research

Datasets

Projects

EVA

Campus Program

Challenges

Ventures

FormalMATH

A Large-Scale Benchmark for Formal Mathematical Reasoning in Lean4

FormalMATH: A Large-Scale Benchmark for Formal Mathematical Reasoning in Lean4

Introduction

Data Sample