2077AI

Revolutionizing AI Data

The 2077AI Foundation, is at the forefront of AI data standardization and progression. Our goal is unlocking greater value from data, accelerating AI development, and fostering an efficient, prosperous AI data ecosystem. Utilizing a $10.24 million investment, we're committed to promoting open source initiatives and invite you to join us in shaping the future of AI.

DataSets

SuperGPQA: An LLM Evaluation Benchmark Across 285 Graduate Disciplines: SuperGPQA is a large-scale and highly challenging benchmark created to evaluate the advanced reasoning capabilities of Large Language Models (LLMs). Its purpose is to test model performance on expert-level, graduate qualification questions across an unprecedented 285 academic and professional disciplines.

SuperGPQA

SuperGPQA is a large-scale and highly challenging benchmark created to evaluate the advanced reasoning capabilities of Large Language Models (LLMs). Its purpose is to test model performance on expert-level, graduate qualification questions across an unprecedented 285 academic and professional disciplines.

VeriWeb

Discover VeriWeb, a pioneering benchmark for long-horizon web agents. It offers a reproducible environment and 302 real-world tasks with subtask-level verification, advancing research in complex information-seeking.

OmniDocBench

OmniDocBench is a comprehensive benchmark for evaluating AI in document parsing and content extraction.

PIN Dataset

Discover PIN, a new data format and two large-scale datasets (PIN-200M & PIN-14M) designed to help LMMs understand complex, knowledge-intensive multimodal documents.

EditReward

EDITREWARD is trained with our new large-scale human preference dataset, meticulously annotated by trained experts following a rigorous protocol containing over 200K preference pairs.

VideoScore2

VideoScore2 is a multi-dimensional, interpretable, and human-aligned framework that explicitly evaluates visual quality,text-to-video alignment, and physical/common-sense consistency while producing detailed chain-of-thought rationales.

KOR-BENCH

Knowledge-Orthogonal Reasoning Benchmark (KOR-Bench) is designed to evaluate models‘ intrinsic reasoning and planning abilities by minimizing interference from pretrained knowledge. It introduces new rules that are independent of prior knowledge, allowing for a more accurate assessment of how models adapt to novel rule-driven tasks.

MMAR

MMAR (Massive Multi-disciplinary Audio Reasoning) is a new and challenging benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs).

OmniHD-Scenes

OmniHD-Scenes is a massive multimodal autonomous driving dataset. Featuring 450K+ synchronized frames of 128-beam LiDAR, 6-view cameras, and 4D imaging radar data. Includes high-quality 3D bounding boxes and semantic occupancy for complex urban scenarios, rainy weather, and night scenes. Download the 1.3TB dataset now.

NL2Repo-Bench

NL2Repo Bench is a benchmark explicitly designed to evaluate the long-horizon repository generation ability of coding agents.

Chain-of-Agents

Chain-of-Agents (CoA) is a novel framework for training end-to-end agent foundation models (AFM) using multi-agent distillation and agentic reinforcement learning. Our approach addresses key challenges in developing versatile AI agents that can perform complex tasks across diverse domains.

YuE: Open Music Foundation Models for Full-Song Generation

YuE

YuE is a family of open foundation models based on the LLaMA2 architecture. Specifically, YuE scales to trillions of tokens and generates up to five minutes of music while maintaining lyrical alignment, coherent musical structure, and engaging vocal melodies with appropriate accompaniment.

AFM-Datasets

AFM-Datasets is the official training dataset released with the research paper, "Chain-of-Agents," and is specifically designed for building Agent Foundation Models (AFMs).

TaskCraft

TaskCraft is a multi-modal benchmark dataset featuring tasks ranging from simple (1-step) to expert-level (4-step+).

COIG-Writer

COIG-Writer is a novel Chinese creative writing dataset that captures both diverse outputs and their underlying thought processes through systematic reverse-engineering of high-quality texts.

AutoKaggle

AutoKaggle is a powerful and user-centric framework that assists data scientists in completing daily data pipelines through a collaborative multi-agent system. AutoKaggle implements an iterative development process that combines code execution, debugging, and comprehensive unit testing to ensure code correctness and logic consistency.