A High-Density Benchmark Systematically Mapping 261 Disciplines
KINA is a high-density knowledge benchmark encompassing 261 fine-grained disciplines, the first to incorporate disciplinary representativeness as a core metric. It features a reusable, game-theoretic data collection pipeline that mitigates annotation vulnerabilities.
Bubble size = question count · Lower score = more challenging for SOTA models
We evaluate 45 models from 13 major AI labs on KINA. Scores are reported as avg@4 accuracy.
| Rank | Model | Type | ALL | Agr. | Econ. | Edu. | Eng. | Hist. | Law | Arts | Mgt. | Med. | Phil. | Sci. | Soc. |
|---|
Hover to see statistics. Click a violin to jump to the model in the leaderboard.
We curate a hierarchical taxonomy of Disciplines grounded in the U.S. Classification of Instructional Programs (CIP).
The finalized dataset comprises 899 instances, distributed across 12 disciplines, 70 fields, and 261 fine-grained subfields.
Click any block to drill into its Level-3 sub-disciplines. Click the breadcrumb to return.
Hover a dot to see score and release date. Click to jump to the model in the leaderboard.
If you find KINA useful in your research, please cite our paper:
@misc{jin2026knowledgeindexnoahsark,
title={Knowledge Index of Noah's Ark},
author={Sheng Jin and Minghao Liu and Yunze Xiao and Zeqi Zhou and Heli Qi and Yifan
Yao and Meishu Song and Kaijing Ma and Xuan Zhang and Sicong Jiang and Yizhe
Li and Ningshan Ma and Jie Wei and Ziniu Li and Minglai Yang and Bangya Liu
and Yiming Liang and Xiao Fang and Qingcheng Zeng and Jiarui Liu and Rui Yang
and Shen Yan and Wenhao Huang and Jiaheng Liu and Zihan Wang and Weihao Xuan
and Ge Zhang},
year={2026},
eprint={2606.05104},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2606.05104},
}