logo

Blog

  • Datasets

  • Blogs

  • About

  • Projects

    • Project-EVA

  • Resources

    • Paper

Contact Us
Contact Us
logo
  • Datasets

  • Blogs

  • About

  • Projects

    Project-EVA
  • Resources

    Paper

Blog/Featured

Featured Content

Blog cover

Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model

Blog cover

Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis

Blog cover

VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities

Blog cover

Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor

Blog cover

FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI

Blog cover

Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models

Blog cover

A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench

Blog cover

OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving

Blog cover

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Exploring the Real Proficiency Boundaries of LLM

Latest
Content

Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model

Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model

Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis

Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis

VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities

VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities

Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor

Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor

FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI

FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI

Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models

Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models

A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench

A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench

OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving

OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving

 SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Exploring the Real Proficiency Boundaries of LLM

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Exploring the Real Proficiency Boundaries of LLM

2077AI

Join Us In Shaping The Future Of AI
Contact Us
Contact Us
2077AI ©2025Join Us in Shaping the Future of AI Contact Us