AFM-Datasets: An End-to-End Chain-of-Agents Dataset image

AFM-Datasets: An End-to-End Chain-of-Agents Dataset

Introduction

Dataset AFM-Datasets
Modalities Text
Formats json
Languages English
Size 41.6 kB
Release Date 2025-08-06
Domain Agent
License apache-2.0

AFM-Datasets is the official training dataset released with the research paper, "Chain-of-Agents," and is specifically designed for building Agent Foundation Models (AFMs).

The core objective of this dataset is to train a single large language model to simulate a "multi-agent team," enabling it to solve complex tasks—such as web navigation and code generation—autonomously and end-to-end.

It primarily consists of two types of data:

  • Supervised Fine-Tuning (SFT) Data: Generated through "multi-agent distillation," this data captures the complete problem-solving trajectories of state-of-the-art multi-agent systems.
  • Reinforcement Learning (RL) Data: Used for agentic reinforcement learning to further enhance the model's decision-making and execution abilities on verifiable tasks.

Sample

AFM-WebAgent-SFT-Dataset

AFM-MHQA-RL-Dataset

AFM-MHQA-Agent-SFT-Dataset

AFM-WebAgent-RL-Dataset

AFM-CodeAgent-RL-Dataset

AFM-CodeAgent-SFT-Dataset