NCSep 3, 2024
What Makes a Face Look like a Hat: Decoupling Low-level and High-level Visual Properties with Image TripletsMaytus Piriyajitakonkij, Sirawaj Itthipuripat, Ian Ballard et al.
In visual decision making, high-level features, such as object categories, have a strong influence on choice. However, the impact of low-level features on behavior is less understood partly due to the high correlation between high- and low-level features in the stimuli presented (e.g., objects of the same category are more likely to share low-level features). To disentangle these effects, we propose a method that de-correlates low- and high-level visual properties in a novel set of stimuli. Our method uses two Convolutional Neural Networks (CNNs) as candidate models of the ventral visual stream: the CORnet-S that has high neural predictivity in high-level, IT-like responses and the VGG-16 that has high neural predictivity in low-level responses. Triplets (root, image1, image2) of stimuli are parametrized by the level of low- and high-level similarity of images extracted from the different layers. These stimuli are then used in a decision-making task where participants are tasked to choose the most similar-to-the-root image. We found that different networks show differing abilities to predict the effects of low-versus-high-level similarity: while CORnet-S outperforms VGG-16 in explaining human choices based on high-level similarity, VGG-16 outperforms CORnet-S in explaining human choices based on low-level similarity. Using Brain-Score, we observed that the behavioral prediction abilities of different layers of these networks qualitatively corresponded to their ability to explain neural activity at different levels of the visual hierarchy. In summary, our algorithm for stimulus set generation enables the study of how different representations in the visual stream affect high-level cognitive behaviors.
ROMar 4, 2024
TTA-Nav: Test-time Adaptive Reconstruction for Point-Goal Navigation under Visual CorruptionsMaytus Piriyajitakonkij, Mingfei Sun, Mengmi Zhang et al.
Robot navigation under visual corruption presents a formidable challenge. To address this, we propose a Test-time Adaptation (TTA) method, named as TTA-Nav, for point-goal navigation under visual corruptions. Our "plug-and-play" method incorporates a top-down decoder to a pre-trained navigation model. Firstly, the pre-trained navigation model gets a corrupted image and extracts features. Secondly, the top-down decoder produces the reconstruction given the high-level features extracted by the pre-trained model. Then, it feeds the reconstruction of a corrupted image back to the pre-trained model. Finally, the pre-trained model does forward pass again to output action. Despite being trained solely on clean images, the top-down decoder can reconstruct cleaner images from corrupted ones without the need for gradient-based adaptation. The pre-trained navigation model with our top-down decoder significantly enhances navigation performance across almost all visual corruptions in our benchmarks. Our method improves the success rate of point-goal navigation from the state-of-the-art result of 46% to 94% on the most severe corruption. This suggests its potential for broader application in robotic visual navigation. Project page: https://sites.google.com/view/tta-nav
AIMay 19, 2025
From Grunts to Lexicons: Emergent Language from Cooperative ForagingMaytus Piriyajitakonkij, Rujikorn Charakorn, Weicheng Tao et al.
Language is a powerful communicative and cognitive tool. It enables humans to express thoughts, share intentions, and reason about complex phenomena. Despite our fluency in using and understanding language, the question of how it arises and evolves over time remains unsolved. A leading hypothesis in linguistics and anthropology posits that language evolved to meet the ecological and social demands of early human cooperation. Language did not arise in isolation, but through shared survival goals. Inspired by this view, we investigate the emergence of language in multi-agent Foraging Games. These environments are designed to reflect the cognitive and ecological constraints believed to have influenced the evolution of communication. Agents operate in a shared grid world with only partial knowledge about other agents and the environment, and must coordinate to complete games like picking up high-value targets or executing temporally ordered actions. Using end-to-end deep reinforcement learning, agents learn both actions and communication strategies from scratch. We find that agents develop communication protocols with hallmark features of natural language: arbitrariness, interchangeability, displacement, cultural transmission, and compositionality. We quantify each property and analyze how different factors, such as population size, social dynamics, and temporal dependencies, shape specific aspects of the emergent language. Our framework serves as a platform for studying how language can evolve from partial observability, temporal reasoning, and cooperative goals in embodied multi-agent settings. We will release all data, code, and models publicly.
LGJun 18, 2021
Deep Reinforcement Learning Models Predict Visual Responses in the Brain: A Preliminary ResultMaytus Piriyajitakonkij, Sirawaj Itthipuripat, Theerawit Wilaiprasitporn et al.
Supervised deep convolutional neural networks (DCNNs) are currently one of the best computational models that can explain how the primate ventral visual stream solves object recognition. However, embodied cognition has not been considered in the existing visual processing models. From the ecological standpoint, humans learn to recognize objects by interacting with them, allowing better classification, specialization, and generalization. Here, we ask if computational models under the embodied learning framework can explain mechanisms underlying object recognition in the primate visual system better than the existing supervised models? To address this question, we use reinforcement learning to train neural network models to play a 3D computer game and we find that these reinforcement learning models achieve neural response prediction accuracy scores in the early visual areas (e.g., V1 and V2) in the levels that are comparable to those accomplished by the supervised neural network model. In contrast, the supervised neural network models yield better neural response predictions in the higher visual areas, compared to the reinforcement learning models. Our preliminary results suggest the future direction of visual neuroscience in which deep reinforcement learning should be included to fill the missing embodiment concept.
LGMar 5, 2021
A Pilot Study on Visually Stimulated Cognitive Tasks for EEG-Based Dementia RecognitionSupavit Kongwudhikunakorn, Suktipol Kiatthaveephong, Kamonwan Thanontip et al.
In the status quo, dementia is yet to be cured. Precise diagnosis prior to the onset of the symptoms can prevent the rapid progression of the emerging cognitive impairment. Recent progress has shown that Electroencephalography (EEG) is the promising and cost-effective test to facilitate the detection of neurocognitive disorders. However, most of the existing works have been using only resting-state EEG. The efficiencies of EEG signals from various cognitive tasks, for dementia classification, have yet to be thoroughly investigated. In this study, we designed four cognitive tasks that engage different cognitive performances: attention, working memory, and executive function. We investigated these tasks by using statistical analysis on both time and frequency domains of EEG signals from three classes of human subjects: Dementia (DEM), Mild Cognitive Impairment (MCI), and Normal Control (NC). We also further evaluated the classification performances of two features extraction methods: Principal Component Analysis (PCA) and Filter Bank Common Spatial Pattern (FBCSP). We found that the working memory related tasks yielded good performances for dementia recognition in both cases using PCA and FBCSP. Moreover, FBCSP with features combination from four tasks revealed the best sensitivity of 0.87 and the specificity of 0.80. To our best knowledge, this is the first work that concurrently investigated several cognitive tasks for dementia recognition using both statistical analysis and classification scores. Our results yielded essential information to design and aid in conducting further experimental tasks to early diagnose dementia patients.