Inci Ayhan

AI
h-index30
4papers
54citations
Novelty38%
AI Score32

4 Papers

CVAug 27, 2025
Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360-Degree Videos

Mert Cokelek, Halit Ozsoy, Nevrez Imamoglu et al.

Omnidirectional videos (ODVs) are redefining viewer experiences in virtual reality (VR) by offering an unprecedented full field-of-view (FOV). This study extends the domain of saliency prediction to 360-degree environments, addressing the complexities of spherical distortion and the integration of spatial audio. Contextually, ODVs have transformed user experience by adding a spatial audio dimension that aligns sound direction with the viewer's perspective in spherical scenes. Motivated by the lack of comprehensive datasets for 360-degree audio-visual saliency prediction, our study curates YT360-EyeTracking, a new dataset of 81 ODVs, each observed under varying audio-visual conditions. Our goal is to explore how to utilize audio-visual cues to effectively predict visual saliency in 360-degree videos. Towards this aim, we propose two novel saliency prediction models: SalViT360, a vision-transformer-based framework for ODVs equipped with spherical geometry-aware spatio-temporal attention layers, and SalViT360-AV, which further incorporates transformer adapters conditioned on audio input. Our results on a number of benchmark datasets, including our YT360-EyeTracking, demonstrate that SalViT360 and SalViT360-AV significantly outperform existing methods in predicting viewer attention in 360-degree scenes. Interpreting these results, we suggest that integrating spatial audio cues in the model architecture is crucial for accurate saliency prediction in omnidirectional videos. Code and dataset will be available at https://cyberiada.github.io/SalViT360.

NCOct 4, 2022
Predictive Event Segmentation and Representation with Neural Networks: A Self-Supervised Model Assessed by Psychological Experiments

Hamit Basgol, Inci Ayhan, Emre Ugur

People segment complex, ever-changing and continuous experience into basic, stable and discrete spatio-temporal experience units, called events. Event segmentation literature investigates the mechanisms that allow people to extract events. Event segmentation theory points out that people predict ongoing activities and observe prediction error signals to find event boundaries that keep events apart. In this study, we investigated the mechanism giving rise to this ability by a computational model and accompanying psychological experiments. Inspired from event segmentation theory and predictive processing, we introduced a self-supervised model of event segmentation. This model consists of neural networks that predict the sensory signal in the next time-step to represent different events, and a cognitive model that regulates these networks on the basis of their prediction errors. In order to verify the ability of our model in segmenting events, learning them during passive observation, and representing them in its internal representational space, we prepared a video that depicts human behaviors represented by point-light displays. We compared event segmentation behaviors of participants and our model with this video in two hierarchical event segmentation levels. By using point-biserial correlation technique, we demonstrated that event segmentation decisions of our model correlated with the responses of participants. Moreover, by approximating representation space of participants by a similarity-based technique, we showed that our model formed a similar representation space with those of participants. The result suggests that our model that tracks the prediction error signals can produce human-like event boundaries and event representations. Finally, we discussed our contribution to the literature of event cognition and our understanding of how event segmentation is implemented in the brain.

HCJul 26, 2020
Trick the Body Trick the Mind: Avatar representation affects the perception of available action possibilities in Virtual Reality

Tugce Akkoc, Emre Ugur, Inci Ayhan

In immersive Virtual Reality (VR), your brain can trick you into believing that your virtual hands are your real hands. Manipulating the representation of the body, namely the avatar, is a potentially powerful tool for the design of innovative interactive systems in VR. In this study, we investigated interactive behavior in VR by using the methods of experimental psychology. Objects with handles are known to potentiate the afforded action. Participants tend to respond faster when the handle is on the same side as the responding hand in bi-manual speed response tasks. In the first experiment, we successfully replicated this affordance effect in a Virtual Reality (VR) setting. In the second experiment, we showed that the affordance effect was influenced by the avatar, which was manipulated by two different hand types: 1) hand models with full finger tracking that are able to grasp objects, and 2) capsule-shaped -- fingerless -- hand models that are not able to grasp objects. We found that less than 5 minutes of adaptation to an avatar, significantly altered the affordance perception. Counter intuitively, action planning was significantly shorter with the hand model that is not able to grasp. Possibly, fewer action possibilities provided an advantage in processing time. The presence of a handle speeded up the initiation of the hand movement but slowed down the action completion because of ongoing action planning. The results were examined from a multidisciplinary perspective and the design implications for VR applications were discussed.

AIJul 23, 2020
Time Perception: A Review on Psychological, Computational and Robotic Models

Hamit Basgol, Inci Ayhan, Emre Ugur

Animals exploit time to survive in the world. Temporal information is required for higher-level cognitive abilities such as planning, decision making, communication, and effective cooperation. Since time is an inseparable part of cognition, there is a growing interest in the artificial intelligence approach to subjective time, which has a possibility of advancing the field. The current survey study aims to provide researchers with an interdisciplinary perspective on time perception. Firstly, we introduce a brief background from the psychology and neuroscience literature, covering the characteristics and models of time perception and related abilities. Secondly, we summarize the emergent computational and robotic models of time perception. A general overview to the literature reveals that a substantial amount of timing models are based on a dedicated time processing like the emergence of a clock-like mechanism from the neural network dynamics and reveal a relationship between the embodiment and time perception. We also notice that most models of timing are developed for either sensory timing (i.e. ability to assess an interval) or motor timing (i.e. ability to reproduce an interval). The number of timing models capable of retrospective timing, which is the ability to track time without paying attention, is insufficient. In this light, we discuss the possible research directions to promote interdisciplinary collaboration in the field of time perception.