Introduction

OmniDocBench is a comprehensive benchmark for evaluating AI in document parsing and content extraction. Addressing the limitations of existing benchmarks—namely their narrow coverage and simplistic evaluations. It provides high-quality annotations across 9 diverse sources, including academic papers, handwritten notes, and densely typeset newspapers. OmniDocBench effectively reveals weaknesses in top-performing models when they process complex layouts and content structures, highlighting its challenging nature and its potential to drive future progress in Document AI.

Datasets Overview
Datasets Overview

Dataset Overview

OmniDocBench contains 1355 pages across 9 distinct document types, with over 100,000 fine-grained annotations.

  • Academic Papers
  • Slides
  • Books
  • Textbooks
  • Exam Papers
  • Notes
  • Megazines
  • Financial Reports
  • Newspapers

Dataset Overview

Data Samples

Leaderboard

Leaderboard
OverallModelModel TypeSizeTextEditFormulaCDMTableTEDSTableTEDS-SRead OrderEdit
191.93PaddleOCR-VLSpecialized VLMs0.9B0.03988.6791.0194.850.048
290.67MinerU2.5Specialized VLMs1.2B0.04788.4688.2292.380.044
389.15Qwen3-VL-235B-A22B-InstructGeneral VLMs235B0.06988.1486.2190.550.068
88.85MonkeyOCR-pro-3BSpecialized VLMs3B0.07587.2586.7890.630.128
88.41dots.ocrSpecialized VLMs3B0.04883.2286.7890.620.053
88.03Gemini-2.5 ProGeneral VLMs-0.07585.8285.7190.290.097
87.13MonkeyOCR-3BSpecialized VLMs3B0.07587.4581.3985.920.129
87.02Qwen2.5-VLGeneral VLMs72B0.09488.2782.1586.220.102
87.01Deepseek-OCRSpecialized VLMs3B0.07383.3784.9788.80.086
86.96MonkeyOCR-pro-1.2BSpecialized VLMs1.2B0.08485.0284.2489.020.13
86.73PP-StructureV3Pipeline Tools-0.07385.7981.6889.480.073
85.59Nanonets-OCR-sSpecialized VLMs3B0.09385.980.1485.570.108
85.56MinerU2-VLMSpecialized VLMs0.9B0.07880.9583.5487.660.086
83.21Dolphin-1.5Specialized VLMs0.3B0.09280.7878.0684.10.08
82.67InternVL3.5General VLMs241B0.14287.237581.280.125
81.79olmOCRSpecialized VLMs7B0.09686.0468.9274.770.121
80.98POINTS-ReaderSpecialized VLMs3B0.13479.277.1381.660.145
80.33InternVL3General VLMs78B0.13183.4270.6477.740.113
78.83Mistral OCRSpecialized VLMs-0.16482.8470.0378.040.144
75.51Mineru2-pipelinePipeline Tools-0.20976.5570.979.110.225
75.02GPT-4oGeneral VLMs-0.21779.767.0776.090.148
74.82OCRFluxSpecialized VLMs3B0.19368.0375.7580.230.202
74.67DolphinSpecialized VLMs0.3B0.12567.8568.777.770.124
71.3Marker-1.8.2Pipeline Tools-0.20676.6657.8871.170.25

Further Analysis

In order to gain a deeper understanding of the performance of our model, this section presents the results of a series of detailed analysis experiments.

Vary Standards

The Vary Standards in parsing Header, Footers, and so on
The Vary Standards in parsing Captions

The Vary Standards in parsing Header, Footers, and so on

Data Display

Academic Paper
Books
Colorful Textbook
Notes
Magazines
Financial Report
Newspaper
Exam Paper
Slides

Academic Paper

BibTeX

Codebibtex
@misc{ouyang2024omnidocbench,
  title         =     "OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations",
  author        =     "Linke Ouyang and Yuan Qu and Hongbin Zhou and Jiawei Zhu and Rui Zhang and Qunshu Lin and Bin Wang and Zhiyuan Zhao and Man Jiang and Xiaomeng Zhao and Jin Shi and Fan Wu and Pei Chu and Minghao Liu and Zhenxiang Li and Chao Xu and Bo Zhang and Botian Shi and Zhongying Tu and Conghui He",
  eprint        =     "2412.07626",
  archivePrefix =     "arXiv",
  year          =     "2024",
  primaryClass  =     "cs.CV",
  url           =     "https://arxiv.org/abs/2412.07626"
}

Designed by 2077AI Team