Blog
Datasets
Blogs
About
Projects
Project-EVA
Resources
Paper
Datasets
Blogs
About
Projects
Project-EVA
Resources
Paper
Blog
/Featured
Featured Content
IWR-Bench: Can AI Rebuild an Interactive Website Just by Watching a Video?
Introducing EDITREWARD: The AI Judge That’s Closing the Gap in Open-Source Image Editing
Unlocking Deeper Multimodal Understanding: Introducing PIN-200M, A Massive Dataset for Next-Gen LMMs
Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model
Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis
VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities
Creative Writing Dataset with Thought Processes: Unleashing Human-like Creativity in AI
Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor
FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI
Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models
A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench
OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Exploring the Real Proficiency Boundaries of LLM
Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus
PIN Dataset: A Unified Paradigm for Multimodal Learning
Latest
Content
IWR-Bench: Can AI Rebuild an Interactive Website Just by Watching a Video?
Introducing EDITREWARD: The AI Judge That’s Closing the Gap in Open-Source Image Editing
Unlocking Deeper Multimodal Understanding: Introducing PIN-200M, A Massive Dataset for Next-Gen LMMs
Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model
Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis
VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities
Creative Writing Dataset with Thought Processes: Unleashing Human-like Creativity in AI
Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor
FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI
Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models
A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench
OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Exploring the Real Proficiency Boundaries of LLM
Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus
PIN Dataset: A Unified Paradigm for Multimodal Learning