Blog/Featured

Blog cover

NL2Repo-Bench: Why GPT-5 & Gemini Struggle with Long-Horizon Coding

Blog cover

Have LLMs Hit a Ceiling? Why SuperGPQA Proves the AGI Journey is Just Beginning

Blog cover

Beyond Crowdsourcing: How SuperGPQA Uses PhD Experts to Solve LLM Data Leakage

Blog cover

2077AI 2025 Annual Report: Pioneering Open Source AI Innovation

Blog cover

GPT-5 Series vs. Gemini 3 Pro: The Verdict from SuperGPQA

Blog cover

Scaling Test-Time Compute: How CriticLean Anticipated DeepSeekMath

Blog cover

Google Gemini 3 Sets New SOTA on OmniDocBench: The New Standard for Document AI

Blog cover

Meet VideoScore2: The AI Film Critic That Thinks Before It Scores

Blog cover

IWR-Bench: Can AI Rebuild an Interactive Website Just by Watching a Video?

Blog cover

Introducing EDITREWARD: The AI Judge That’s Closing the Gap in Open-Source Image Editing

Blog cover

Unlocking Deeper Multimodal Understanding: Introducing PIN-200M, A Massive Dataset for Next-Gen LMMs

Blog cover

Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model

Blog cover

Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis

Blog cover

VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities

Blog cover

Creative Writing Dataset with Thought Processes: Unleashing Human-like Creativity in AI

Blog cover

Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor

Blog cover

FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI

Blog cover

Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models

Blog cover

A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench

Blog cover

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Exploring the Real Proficiency Boundaries of LLM

Blog cover

OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving

Blog cover

Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus

Blog cover

PIN Dataset: A Unified Paradigm for Multimodal Learning

NL2Repo-Bench: Why GPT-5 & Gemini Struggle with Long-Horizon Coding

Have LLMs Hit a Ceiling? Why SuperGPQA Proves the AGI Journey is Just Beginning

Beyond Crowdsourcing: How SuperGPQA Uses PhD Experts to Solve LLM Data Leakage

2077AI 2025 Annual Report: Pioneering Open Source AI Innovation

GPT-5 Series vs. Gemini 3 Pro: The Verdict from SuperGPQA

Scaling Test-Time Compute: How CriticLean Anticipated DeepSeekMath

Google Gemini 3 Sets New SOTA on OmniDocBench: The New Standard for Document AI

Meet VideoScore2: The AI Film Critic That Thinks Before It Scores

IWR-Bench: Can AI Rebuild an Interactive Website Just by Watching a Video?

Introducing EDITREWARD: The AI Judge That’s Closing the Gap in Open-Source Image Editing

Unlocking Deeper Multimodal Understanding: Introducing PIN-200M, A Massive Dataset for Next-Gen LMMs

Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model

Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis

VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities

Creative Writing Dataset with Thought Processes: Unleashing Human-like Creativity in AI

Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor

FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI

Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models

A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Exploring the Real Proficiency Boundaries of LLM

OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving

Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus

PIN Dataset: A Unified Paradigm for Multimodal Learning