Blog/Featured
Featured Content
Latest
Content

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines - Exploring the Real Proficiency Boundaries of LLM

Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models

A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench

Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus

OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving
