CVNov 9, 2025
AesTest: Measuring Aesthetic Intelligence from Perception to ProductionGuolong Wang, Heng Huang, Zhiqiang Zhang et al.
Perceiving and producing aesthetic judgments is a fundamental yet underexplored capability for multimodal large language models (MLLMs). However, existing benchmarks for image aesthetic assessment (IAA) are narrow in perception scope or lack the diversity needed to evaluate systematic aesthetic production. To address this gap, we introduce AesTest, a comprehensive benchmark for multimodal aesthetic perception and production, distinguished by the following features: 1) It consists of curated multiple-choice questions spanning ten tasks, covering perception, appreciation, creation, and photography. These tasks are grounded in psychological theories of generative learning. 2) It integrates data from diverse sources, including professional editing workflows, photographic composition tutorials, and crowdsourced preferences. It ensures coverage of both expert-level principles and real-world variation. 3) It supports various aesthetic query types, such as attribute-based analysis, emotional resonance, compositional choice, and stylistic reasoning. We evaluate both instruction-tuned IAA MLLMs and general MLLMs on AesTest, revealing significant challenges in building aesthetic intelligence. We will publicly release AesTest to support future research in this area.
CVJan 27, 2024
Applications of Tao General Difference in Discrete DomainLinmi Tao, Ruiyang Liu, Donglai Tao et al.
Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capabilities in real-world applications. A novel smoothness property of a sequence is defined on the first- and second TGD. This property is used to denoise one-dimensional signals, where the noise is the non-smooth points in the sequence. Meanwhile, the center of the gradient in a finite interval can be accurately location via TGD calculation. This solves a traditional challenge in computer vision, which is the precise localization of image edges with noise robustness. Furthermore, the power of TGD operators extends to spatio-temporal edge detection in three-dimensional arrays, enabling the identification of kinetic edges in video data. These diverse applications highlight the properties of TGD in discrete domain and the significant promise of TGD for the computation across signal processing, image analysis, and video analytic.
DMMay 14, 2023
A Theory of General Difference in Continuous and Discrete DomainLinmi Tao, Ruiyang Liu, Donglai Tao et al.
Though a core element of the digital age, numerical difference algorithms struggle with noise susceptibility. This stems from a key disconnect between the infinitesimal quantities in continuous differentiation and the finite intervals in its discrete counterpart. This disconnect violates the fundamental definition of differentiation (Leibniz and Cauchy). To bridge this gap, we build a novel general difference (Tao General Difference, TGD). Departing from derivative-by-integration, TGD generalizes differentiation to finite intervals in continuous domains through three key constraints. This allows us to calculate the general difference of a sequence in discrete domain via the continuous step function constructed from the sequence. Two construction methods, the rotational construction and the orthogonal construction, are proposed to construct the operators of TGD. The construction TGD operators take same convolution mode in calculation for continuous functions, discrete sequences, and arrays across any dimension. Our analysis with example operations showcases TGD's capability in both continuous and discrete domains, paving the way for accurate and noise-resistant differentiation in the digital era.