TaskCraft: A Multi-Modal Benchmark for Agentic Task Processing image

TaskCraft: A Multi-Modal Benchmark for Agentic Task Processing

Introduction

DatasetTaskCraft
ModalitiesText, Image
Formatsjsonl
LanguagesEnglish, Chinese
Size10-100k
Release Date2025-07-11
DomainSynthetic
LicenseMIT

TaskCraft is a multi-modal benchmark dataset featuring tasks ranging from simple (1-step) to expert-level (4-step+). It contains over 40,000 meticulously curated task instances designed to advance research in:

  • Agent-based task processing
  • Tool invocation systems
  • Multi-step reasoning

It is a large-scale dataset specifically designed to evaluate and train AI agents on how to understand complex instructions, decompose tasks, and accurately invoke external tools (such as PDF processors, HTML parsers, and image analyzers) to achieve their goals.

Data Sample