A High-Density Benchmark Systematically Mapping 261 Disciplines
KINA is a high-density knowledge benchmark encompassing 261 fine-grained disciplines, the first to incorporate disciplinary representativeness as a core metric. It features a reusable, game-theoretic data collection pipeline that mitigates annotation vulnerabilities.
Bubble size = question count · Lower score = more challenging for SOTA models
We evaluate 42 models from 13 major AI labs on KINA. Scores are reported as avg@4 accuracy.
| Rank | Model | Type | ALL | Agr. | Econ. | Edu. | Eng. | Hist. | Law | Arts | Mgt. | Med. | Phil. | Sci. | Soc. |
|---|
Hover to see statistics. Click a violin to jump to the model in the leaderboard.
We curate a hierarchical taxonomy of Disciplines grounded in the U.S. Classification of Instructional Programs (CIP).
The finalized dataset comprises 899 instances, distributed across 12 disciplines, 70 fields, and 261 fine-grained subfields.
Click any block to drill into its Level-3 sub-disciplines. Click the breadcrumb to return.
Hover a dot to see score and release date. Click to jump to the model in the leaderboard.
If you find KINA useful in your research, please cite our paper:
@misc{anonymous2026kina,
title = {KINA: Knowledge Index of Noah's Ark},
author = {Anonymous Authors},
year = {2026},
note = {Under review}
}