Blog
Datasets
Blogs
About
Mission
Opportunities
Partnerships
Projects
Project-EVA
Resources
Paper
Datasets
Blogs
About
Mission
Opportunities
Partnerships
Projects
Project-EVA
Resources
Paper
Blog
/Featured
Featured Content
NL2Repo-Bench: Why GPT-5 & Gemini Struggle with Long-Horizon Coding
Have LLMs Hit a Ceiling? Why SuperGPQA Proves the AGI Journey is Just Beginning
Beyond Crowdsourcing: How SuperGPQA Uses PhD Experts to Solve LLM Data Leakage
2077AI 2025 Annual Report: Pioneering Open Source AI Innovation
GPT-5 Series vs. Gemini 3 Pro: The Verdict from SuperGPQA
Scaling Test-Time Compute: How CriticLean Anticipated DeepSeekMath
Google Gemini 3 Sets New SOTA on OmniDocBench: The New Standard for Document AI
Meet VideoScore2: The AI Film Critic That Thinks Before It Scores
IWR-Bench: Can AI Rebuild an Interactive Website Just by Watching a Video?
Introducing EDITREWARD: The AI Judge That’s Closing the Gap in Open-Source Image Editing
Unlocking Deeper Multimodal Understanding: Introducing PIN-200M, A Massive Dataset for Next-Gen LMMs
Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model
Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis
VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities
Creative Writing Dataset with Thought Processes: Unleashing Human-like Creativity in AI
Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor
FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI
Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models
A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Exploring the Real Proficiency Boundaries of LLM
OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving
Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus
PIN Dataset: A Unified Paradigm for Multimodal Learning
Latest
Content
NL2Repo-Bench: Why GPT-5 & Gemini Struggle with Long-Horizon Coding
Have LLMs Hit a Ceiling? Why SuperGPQA Proves the AGI Journey is Just Beginning
Beyond Crowdsourcing: How SuperGPQA Uses PhD Experts to Solve LLM Data Leakage
2077AI 2025 Annual Report: Pioneering Open Source AI Innovation
GPT-5 Series vs. Gemini 3 Pro: The Verdict from SuperGPQA
Scaling Test-Time Compute: How CriticLean Anticipated DeepSeekMath
Google Gemini 3 Sets New SOTA on OmniDocBench: The New Standard for Document AI
Meet VideoScore2: The AI Film Critic That Thinks Before It Scores
IWR-Bench: Can AI Rebuild an Interactive Website Just by Watching a Video?
Introducing EDITREWARD: The AI Judge That’s Closing the Gap in Open-Source Image Editing
Unlocking Deeper Multimodal Understanding: Introducing PIN-200M, A Massive Dataset for Next-Gen LMMs
Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model
Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis
VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities
Creative Writing Dataset with Thought Processes: Unleashing Human-like Creativity in AI
Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor
FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI
Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models
A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Exploring the Real Proficiency Boundaries of LLM
OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving
Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus
PIN Dataset: A Unified Paradigm for Multimodal Learning