MMAR: A Benchmark for Deep Audio Reasoning

Introduction

Dataset	MMAR
Modalities	Text, Audio
Formats	json
Languages	English, Chinese, etc.(16 total)
Size	168kB
Release Date	2025-05-19
Domain	Audio Processing, Speech Recognition
License	cc-by-nc-4.0

MMAR (Massive Multi-disciplinary Audio Reasoning) is a new and challenging benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs).

It consists of 1,000 meticulously curated audio-question-answer triplets sourced from real-world internet videos. Each task requires multi-step deep reasoning that goes far beyond surface-level perception.
A key feature of MMAR is its diverse coverage of modalities, including not only traditional speech, audio, and music, but also complex mixtures of them. Furthermore, the benchmark is designed to be difficult, with a portion of questions requiring graduate-level perceptual and domain-specific knowledge to answer correctly.

About

Mission

Events

News

Opportunities

Partnerships

Research

Datasets

Projects

EVA

Campus Program

Challenges

Ventures

MMAR

A Benchmark for Deep Audio Reasoning

MMAR: A Benchmark for Deep Audio Reasoning

Introduction

Data Sample