Knowledge Index of Noah's Ark

A High-Density Benchmark Systematically Mapping 261 Disciplines

2077AI Research Foundation

Overview

KINA is a high-density knowledge benchmark encompassing 261 fine-grained disciplines, the first to incorporate disciplinary representativeness as a core metric. It features a reusable, game-theoretic data collection pipeline that mitigates annotation vulnerabilities.

261Disciplines
899Questions
10Options

Benchmark Comparison

Bubble size = question count  ·  Lower score = more challenging for SOTA models

KINA (Ours)
Other Benchmarks
Saturation zone (>80%)

Leaderboard

We evaluate 42 models from 13 major AI labs on KINA. Scores are reported as avg@4 accuracy.

Filter:
Rank Model Type ALL Agr. Econ. Edu. Eng. Hist. Law Arts Mgt. Med. Phil. Sci. Soc.
Closed-Source
Open-Source
1 Gold
2 Silver
3 Bronze
Bold = Best in column

Data Sample

Data Collection Pipeline

1

Source

Annotators
PhD-level or above
Verified domain expertise
Top-tier institutional affiliation
Qualified Experts
Source Materials
Peer-reviewed Journals
CSSCI / SSCI / SCI Indexed
Preprints (arXiv, bioRxiv)
Monographs & Textbooks
Technical Reports
2

Annotation

Process
Initial Question Topic Disciplinary Representativeness Confirm
Land Resource Management
“How to ensure representativeness in qualitative research methods?”
Insufficient Uniqueness
Animal Rearing & Breeding
“How to address poor reproductive performance in cows?”
Sufficient Uniqueness
Beat 3/5 SOTA AI
Formal Annotation
Question Options Answer Explanation Source
3

Quality Control

Rule-Based Review
Cosine similarity < 0.8 Options uniqueness check LaTeX compiled in Markdown
LLM-based Filtering
Feature Extraction Failure Pattern Analysis Consensus Voting
Expert-Based Review
Disciplinary Representativeness Depth Factuality & Source Reliability Logic Rigor
Reviewer 1
Reviewer 2
Independent & Blind
Agentic Workflow Verify
Automated re-check pipeline Cross-validation with SOTA models Final consistency verification

Score Distribution

Granularity:

Hover to see statistics. Click a violin to jump to the model in the leaderboard.

Discipline Coverage

We curate a hierarchical taxonomy of Disciplines grounded in the U.S. Classification of Instructional Programs (CIP).
The finalized dataset comprises 899 instances, distributed across 12 disciplines, 70 fields, and 261 fine-grained subfields.

All 12 Disciplines · 70 Fields · 899 Questions

Click any block to drill into its Level-3 sub-disciplines. Click the breadcrumb to return.

Model Scores Over Time

Hover a dot to see score and release date. Click to jump to the model in the leaderboard.

Inference Cost Distribution

Qwen3

Qwen3.5

Inference Cost vs. Performance

BibTeX

If you find KINA useful in your research, please cite our paper:

@misc{anonymous2026kina,
  title  = {KINA: Knowledge Index of Noah's Ark},
  author = {Anonymous Authors},
  year   = {2026},
  note   = {Under review}
}