TaskCraft: A Multi-Modal Benchmark for Agentic Task Processing

Introduction

Dataset	TaskCraft
Modalities	Text, Video
Formats	jsonl
Languages	English, Chinese
Size	10-100k
Release Date	2025-07-11
Domain	Synthetic
License	MIT

TaskCraft is a multi-modal benchmark dataset featuring tasks ranging from simple (1-step) to expert-level (4-step+). It contains over 40,000 meticulously curated task instances designed to advance research in:

Agent-based task processing
Tool invocation systems
Multi-step reasoning

It is a large-scale dataset specifically designed to evaluate and train AI agents on how to understand complex instructions, decompose tasks, and accurately invoke external tools (such as PDF processors, HTML parsers, and image analyzers) to achieve their goals.

About

Mission

Events

News

Opportunities

Partnerships

Research

Datasets

Projects

EVA

Campus Program

Challenges

Ventures

TaskCraft

A Multi-Modal Benchmark for Agentic Task Processing

TaskCraft: A Multi-Modal Benchmark for Agentic Task Processing

Introduction

Data Sample