CVCLCYJun 2

Beyond Semantics: Modeling Factual and Affective Perceptual Experiences from Vision-Language Data

arXiv:2606.0334548.0h-index: 4
AI Analysis

For researchers in affective computing and cross-cultural perception, this work provides a novel method to model subjective perceptual experiences from vision-language data, significantly outperforming existing baselines.

The paper introduces P-Topics modeling to understand affective and cross-cultural image perception, and proposes PercepT, a two-stage architecture that achieves a silhouette score of 0.97 (vs. 0.37 baseline) and AUC of 0.94 (vs. 0.77 baseline) on ArtELingo.

We present P-Topics (Perception Topics) modeling, a novel problem for understanding how images are perceived affectively and across cultures. The goal is to (1) discover and model the different perception experiences in a dataset of images and captions, where each experience is defined by an objective factual and a subjective affective aspect, and (2) associate images to their relevant perception experiences. We introduce **PercepT** (**Percep**tion topic **T**ransformer), a two-stage architecture that tackles P-Topics modeling. In the formation stage, percepT discovers *P-Topics* as visual-textual clusters using an unsupervised training objective, and dynamically selects the number of clusters to match the perceptual richness of the dataset. In the mapping stage, it learns *P-Topic mapping functions* via attention pooling to associate images to their respective clusters. On ArtELingo, PercepT achieves a silhouette score of **0.97** compared to **0.37** from the closest baseline reflecting better perceptual clusters. PercepT also achieves an AUC score of **0.94** compared to **0.77** showing better mapping to perceptual clusters. Human evaluation confirms that PercepT captures semantically meaningful perception experiences and significantly outperforms existing methods. Our implementation will be made public.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes