Mikhail Startsev

MMMar 15, 2019

A Ground-Truth Data Set and a Classification Algorithm for Eye Movements in 360-degree Videos

Ioannis Agtzidis, Mikhail Startsev, Michael Dorr

The segmentation of a gaze trace into its constituent eye movements has been actively researched since the early days of eye tracking. As we move towards more naturalistic viewing conditions, the segmentation becomes even more challenging and convoluted as more complex patterns emerge. The definitions and the well-established methods that were developed for monitor-based eye tracking experiments are often not directly applicable to unrestrained set-ups such as eye tracking in wearable contexts or with head-mounted displays. The main contributions of this work to the eye movement research for 360-degree content are threefold: First, we collect, partially annotate, and make publicly available a new eye tracking data set, which consists of 13 participants viewing 15 video clips that are recorded in 360-degree. Second, we propose a new two-stage pipeline for ground truth annotation of the traditional fixations, saccades, smooth pursuits, as well as (optokinetic) nystagmus, vestibulo-ocular reflex, and pursuit of moving objects performed exclusively via the movement of the head. A flexible user interface for this pipeline is implemented and made freely accessible for use or modification. Lastly, we develop and test a simple proof-of-concept algorithm for automatic classification of all the eye movement types in our data set based on their operational definitions that were used for manual annotation. The data set and the source code for both the annotation tool and the algorithm are publicly available at https://web.gin.g-node.org/ioannis.agtzidis/360_em_dataset.

CVJan 26, 2018

Supersaliency: A Novel Pipeline for Predicting Smooth Pursuit-Based Attention Improves Generalizability of Video Saliency

Mikhail Startsev, Michael Dorr

Predicting attention is a popular topic at the intersection of human and computer vision. However, even though most of the available video saliency data sets and models claim to target human observers' fixations, they fail to differentiate them from smooth pursuit (SP), a major eye movement type that is unique to perception of dynamic scenes. In this work, we highlight the importance of SP and its prediction (which we call supersaliency, due to greater selectivity compared to fixations), and aim to make its distinction from fixations explicit for computational models. To this end, we (i) use algorithmic and manual annotations of SP and fixations for two well-established video saliency data sets, (ii) train Slicing Convolutional Neural Networks for saliency prediction on either fixation- or SP-salient locations, and (iii) evaluate our and 26 publicly available dynamic saliency models on three data sets against traditional saliency and supersaliency ground truth. Overall, our models outperform the state of the art in both the new supersaliency and the traditional saliency problem settings, for which literature models are optimized. Importantly, on two independent data sets, our supersaliency model shows greater generalization ability and outperforms all other models, even for fixation prediction.

Mikhail Startsev

2 Papers