31.2CVMay 21
GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interactionIba Baig, Kevin Li, Yanbin Xu et al.
Video recordings of child-caregiver interactions enable investigation of attentional dynamics during naturalistic behavior. Such multimodal recording also allows researchers to examine how attention interacts with action and language use in real time. However, manual annotation of such data is time-consuming. Here, we introduce GazeBehavior Annotation Toolkit, a deep-learning-based toolkit designed to facilitate three key processes in data preprocessing and feature extraction: post-hoc synchronization across multiple videos, semi-automatic annotation of gaze target categories, and categorization of participants' poses and hand actions. This toolkit improves the efficiency and scalability of feature extraction from human egocentric eye-tracking and video data. Such improvement is critical in supporting large-scale and longitudinal investigations of attentional dynamics and naturalistic behavior in human early development.
LGMar 17, 2025
MAME: Multidimensional Adaptive Metamer Exploration with Human Perceptual FeedbackMina Kamao, Hayato Ono, Ayumu Yamashita et al.
Alignment between human brain networks and artificial models has become an active research area in vision science and machine learning. A widely adopted approach is identifying "metamers," stimuli physically different yet perceptually equivalent within a system. However, conventional methods lack a direct approach to searching for the human metameric space. Instead, researchers first develop biologically inspired models and then infer about human metamers indirectly by testing whether model metamers also appear as metamers to humans. Here, we propose the Multidimensional Adaptive Metamer Exploration (MAME) framework, enabling direct, high-dimensional exploration of human metameric spaces through online image generation guided by human perceptual feedback. MAME modulates reference images across multiple dimensions based on hierarchical neural network responses, adaptively updating generation parameters according to participants' perceptual discriminability. Using MAME, we successfully measured multidimensional human metameric spaces within a single psychophysical experiment. Experimental results using a biologically plausible CNN model showed that human discrimination sensitivity was lower for metameric images based on low-level features compared to high-level features, which image contrast metrics could not explain. The finding suggests a relatively worse alignment between the metameric spaces of humans and the CNN model for low-level processing compared to high-level processing. Counterintuitively, given recent discussions on alignment at higher representational levels, our results highlight the importance of early visual computations in shaping biologically plausible models. Our MAME framework can serve as a future scientific tool for directly investigating the functional organization of human vision.