CLSep 29, 2025

CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task

arXiv:2509.24422v11 citationsh-index: 10Has CodeEMNLP
Originality Incremental advance
AI Analysis

This provides a more comprehensive evaluation tool for LLM developers and researchers, though it is incremental in refining existing benchmarking approaches.

The authors tackled the lack of holistic evaluation frameworks for Large Language Models by proposing the CDT framework, which measures capabilities across cognition, domain, and task dimensions, and experiments showed it improved benchmark scores by 1.6 to 2.2 points over baselines.

Recent advances in Large Language Models (LLMs) have significantly enhanced their capabilities, highlighting the need for comprehensive evaluation frameworks that extend beyond task-specific benchmarks. However, existing benchmarks often focus on isolated abilities, lacking a holistic framework for assessing LLM capabilities. To address this gap, we propose the Cognition-Domain-Task (CDT) framework, which comprehensively measures a model's capabilities across three dimensions. We expand the scope of model capability definitions at the cognitive level by incorporating the Cattell-Horn-Carroll cognitive theory, refining the categorization of model capabilities. We apply CDT in two directions: dataset capability evaluation and data selection. Experiments show that our capability metrics correlate well with downstream performance and can support effective dataset analysis and construction. The experiments on data selection also show significant improvements in both general and specific benchmarks, achieving scores of 44.3 and 45.4, with an increase of 1.6 and 2.2 points over the baselines, respectively. These results validate the effectiveness and practicality of CDT. Source code and models are available at https://github.com/Alessa-mo/CDT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes