FormalMATH: A Large-Scale Benchmark for Formal Mathematical Reasoning in Lean4

Introduction

Dataset

FormalMATH

Modalities

Text

Formats

json

Languages

English, Chinese

Size

5.26MB

Release Date

2025-05-05

Domain

Mathematics

License

mit

FormalMATH is a large-scale benchmark designed to evaluate and advance the capabilities of Large Language Models in the challenging domain of formal mathematical reasoning. It contains 5,560 formally verified problems, all expressed as theorem statements in the Lean4 proof assistant.

  • The dataset is comprehensive in scope, featuring problems that range from high-school Olympiad challenges to undergraduate-level theorems. It covers diverse mathematical fields, including algebra, applied mathematics, calculus, number theory, and discrete mathematics. Each entry consists of a natural language problem paired with its corresponding formal Lean4 statement, along with solutions and other metadata.

  • Created using a novel human-in-the-loop autoformalization pipeline, FormalMATH serves as a robust and difficult testbed for current and future LLM-based theorem provers. Evaluations show that even state-of-the-art models struggle with this benchmark, highlighting significant room for improvement in automated reasoning.

Data Sample

Designed by 2077AI Team