MMAR: A Benchmark for Deep Audio Reasoning

Introduction

Dataset

MMAR

Modalities

Text, Audio

Formats

json

Languages

English, Chinese, etc.(16 total)

Size

168kB

Release Date

2025-05-19

Domain

Audio Processing, Speech Recognition

License

cc-by-nc-4.0

MMAR (Massive Multi-disciplinary Audio Reasoning) is a new and challenging benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs).

  • It consists of 1,000 meticulously curated audio-question-answer triplets sourced from real-world internet videos. Each task requires multi-step deep reasoning that goes far beyond surface-level perception.

  • A key feature of MMAR is its diverse coverage of modalities, including not only traditional speech, audio, and music, but also complex mixtures of them. Furthermore, the benchmark is designed to be difficult, with a portion of questions requiring graduate-level perceptual and domain-specific knowledge to answer correctly.

Data Sample

Designed by 2077AI Team