AIDec 29, 2025

TCEval: Using Thermal Comfort to Assess Cognitive and Perceptual Abilities of AI

arXiv:2512.23217v1
Originality Incremental advance
AI Analysis

This addresses a critical gap in AI evaluation for researchers and developers by shifting focus from abstract tasks to embodied, context-aware perception, though it is incremental as it builds on existing thermal comfort data and LLM agent methods.

The authors tackled the lack of real-world cognitive benchmarks for AI by introducing TCEval, a framework using thermal comfort scenarios to evaluate cross-modal reasoning, causal association, and adaptive decision-making in LLMs. Results showed LLMs have foundational cross-modal reasoning but perform near-randomly in thermal comfort classification, with PMV distributions diverging markedly from human data.

A critical gap exists in LLM task-specific benchmarks. Thermal comfort, a sophisticated interplay of environmental factors and personal perceptions involving sensory integration and adaptive decision-making, serves as an ideal paradigm for evaluating real-world cognitive capabilities of AI systems. To address this, we propose TCEval, the first evaluation framework that assesses three core cognitive capacities of AI, cross-modal reasoning, causal association, and adaptive decision-making, by leveraging thermal comfort scenarios and large language model (LLM) agents. The methodology involves initializing LLM agents with virtual personality attributes, guiding them to generate clothing insulation selections and thermal comfort feedback, and validating outputs against the ASHRAE Global Database and Chinese Thermal Comfort Database. Experiments on four LLMs show that while agent feedback has limited exact alignment with humans, directional consistency improves significantly with a 1 PMV tolerance. Statistical tests reveal that LLM-generated PMV distributions diverge markedly from human data, and agents perform near-randomly in discrete thermal comfort classification. These results confirm the feasibility of TCEval as an ecologically valid Cognitive Turing Test for AI, demonstrating that current LLMs possess foundational cross-modal reasoning ability but lack precise causal understanding of the nonlinear relationships between variables in thermal comfort. TCEval complements traditional benchmarks, shifting AI evaluation focus from abstract task proficiency to embodied, context-aware perception and decision-making, offering valuable insights for advancing AI in human-centric applications like smart buildings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes