MMAR: A Benchmark for Deep Audio Reasoning image

MMAR: A Benchmark for Deep Audio Reasoning

Introduction

DatasetMMAR
ModalitiesText, Audio
Formatsjson
LanguagesEnglish, Chinese, etc.(16 total)
Size168kB
Release Date2025-05-19
DomainAudio Processing, Speech Recognition
Licensecc-by-nc-4.0

MMAR (Massive Multi-disciplinary Audio Reasoning) is a new and challenging benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs).

  • It consists of 1,000 meticulously curated audio-question-answer triplets sourced from real-world internet videos. Each task requires multi-step deep reasoning that goes far beyond surface-level perception.
  • A key feature of MMAR is its diverse coverage of modalities, including not only traditional speech, audio, and music, but also complex mixtures of them. Furthermore, the benchmark is designed to be difficult, with a portion of questions requiring graduate-level perceptual and domain-specific knowledge to answer correctly.

Data Sample

>