TaskCraft: A Multi-Modal Benchmark for Agentic Task Processing
Introduction
Dataset | TaskCraft |
---|---|
Modalities | Text, Image |
Formats | jsonl |
Languages | English, Chinese |
Size | 10-100k |
Release Date | 2025-07-11 |
Domain | Synthetic |
License | MIT |
TaskCraft is a multi-modal benchmark dataset featuring tasks ranging from simple (1-step) to expert-level (4-step+). It contains over 40,000 meticulously curated task instances designed to advance research in:
- Agent-based task processing
- Tool invocation systems
- Multi-step reasoning
It is a large-scale dataset specifically designed to evaluate and train AI agents on how to understand complex instructions, decompose tasks, and accurately invoke external tools (such as PDF processors, HTML parsers, and image analyzers) to achieve their goals.