TaskCraft: A Multi-Modal Benchmark for Agentic Task Processing

Introduction

Dataset

TaskCraft

Modalities

Text, Video

Formats

jsonl

Languages

English, Chinese

Size

10-100k

Release Date

2025-07-11

Domain

Synthetic

License

MIT

TaskCraft is a multi-modal benchmark dataset featuring tasks ranging from simple (1-step) to expert-level (4-step+). It contains over 40,000 meticulously curated task instances designed to advance research in:

  • Agent-based task processing

  • Tool invocation systems

  • Multi-step reasoning

It is a large-scale dataset specifically designed to evaluate and train AI agents on how to understand complex instructions, decompose tasks, and accurately invoke external tools (such as PDF processors, HTML parsers, and image analyzers) to achieve their goals.

Data Sample

Designed by 2077AI Team