47.6HCApr 7
Simulating Word Suggestion Usage in Mobile Typing to Guide Intelligent Text Entry DesignYang Li, Anna Maria Feit
Intelligent text entry (ITE) methods, such as word suggestions, are widely used in mobile typing, yet improving ITE systems is challenging because the cognitive mechanisms behind suggestion use remain poorly understood, and evaluating new systems often requires long-term user studies to account for behavioral adaptation. We present WSTypist, a reinforcement learning-based model that simulates how typists integrate word suggestions into typing. We extend recent hierarchical control models of typing, by identifying and implementing important cognitive mechanisms that underlie the high-level decision-making for integrating word suggestions into manual typing: considering orthographic processes, assessing efficiency gains, and including personal preference on AI support. Our evaluations show that WSTypist simulates diverse human-like suggestion-use strategies, reproduces individual differences, and generalizes across different systems. Importantly, we demonstrate on four design cases how a computational rationality model can be used to inform what-if analyses during the design process, by simulating how users might adapt to changes in the UI or in the algorithmic support, reducing the need for long-term user studies.
CVSep 6, 2024
Self-Supervised Contrastive Learning for Videos using Differentiable Local AlignmentKeyne Oei, Amr Gomaa, Anna Maria Feit et al.
Robust frame-wise embeddings are essential to perform video analysis and understanding tasks. We present a self-supervised method for representation learning based on aligning temporal video sequences. Our framework uses a transformer-based encoder to extract frame-level features and leverages them to find the optimal alignment path between video sequences. We introduce the novel Local-Alignment Contrastive (LAC) loss, which combines a differentiable local alignment loss to capture local temporal dependencies with a contrastive loss to enhance discriminative learning. Prior works on video alignment have focused on using global temporal ordering across sequence pairs, whereas our loss encourages identifying the best-scoring subsequence alignment. LAC uses the differentiable Smith-Waterman (SW) affine method, which features a flexible parameterization learned through the training phase, enabling the model to adjust the temporal gap penalty length dynamically. Evaluations show that our learned representations outperform existing state-of-the-art approaches on action recognition tasks.
HCFeb 9
Intelligent support for Human Oversight: Integrating Reinforcement Learning with Gaze Simulation to Personalize HighlightingThorsten Klößner, João Belo, Zekun Wu et al.
Interfaces for human oversight must effectively support users' situation awareness under time-critical conditions. We explore reinforcement learning (RL)-based UI adaptation to personalize alerting strategies that balance the benefits of highlighting critical events against the cognitive costs of interruptions. To enable learning without real-world deployment, we integrate models of users' gaze behavior to simulate attentional dynamics during monitoring. Using a delivery-drone oversight scenario, we present initial results suggesting that RL-based highlighting can outperform static, rule-based approaches and discuss challenges of intelligent oversight support.
HCApr 22, 2024
Shifting Focus with HCEye: Exploring the Dynamics of Visual Highlighting and Cognitive Load on User Attention and Saliency PredictionAnwesha Das, Zekun Wu, Iza Škrjanec et al.
Visual highlighting can guide user attention in complex interfaces. However, its effectiveness under limited attentional capacities is underexplored. This paper examines the joint impact of visual highlighting (permanent and dynamic) and dual-task-induced cognitive load on gaze behaviour. Our analysis, using eye-movement data from 27 participants viewing 150 unique webpages reveals that while participants' ability to attend to UI elements decreases with increasing cognitive load, dynamic adaptations (i.e., highlighting) remain attention-grabbing. The presence of these factors significantly alters what people attend to and thus what is salient. Accordingly, we show that state-of-the-art saliency models increase their performance when accounting for different cognitive loads. Our empirical insights, along with our openly available dataset, enhance our understanding of attentional processes in UIs under varying cognitive (and perceptual) loads and open the door for new models that can predict user attention while multitasking.
92.2CYApr 9
Keeping an Eye on AI: A Framework for Effective Human Oversight of AI SystemsSusanne Gaube, Markus Langer, Tim Miller et al.
The use of Artificial Intelligence (AI) in high-risk, decision-making scenarios presents technical, safety, and normative challenges; problems that may only be ameliorated by human oversight. However, notions of human oversight lack a common foundational understanding: oversight architectures are not well defined, the roles involved remain unclear, and implementation steps are opaque. Hence, researchers and practitioners struggle to determine how to design, implement, and evaluate systems that enable effective human oversight. This paper advances a practical framework for effective human oversight of AI systems, based on a cross-disciplinary perspective that draws on insights from computer science, human-computer interaction, psychology, philosophy, and law. The core contributions are: (1) a foundational framework, with a working definition, architecture and processes for effective human oversight of AI systems; (2) an initial template for documenting oversight architectures and processes, applied to diverse domains; and (3) a synthesis of open research challenges that need to be considered in the emerging field of effective human oversight of AI systems.
HCMay 15, 2024
Enhancing Saliency Prediction in Monitoring Tasks: The Role of Visual HighlightsZekun Wu, Anna Maria Feit
This study examines the role of visual highlights in guiding user attention in drone monitoring tasks, employing a simulated interface for observation. The experiment results show that such highlights can significantly expedite the visual attention on the corresponding area. Based on this observation, we leverage both the temporal and spatial information in the highlight to develop a new saliency model: the highlight-informed saliency model (HISM), to infer the visual attention change in the highlight condition. Our findings show the effectiveness of visual highlights in enhancing user attention and demonstrate the potential of incorporating these cues into saliency prediction models.
HCJul 26, 2020
Detecting Relevance during Decision-Making from Eye Movements for UI AdaptationAnna Maria Feit, Lukas Vordemann, Seonwook Park et al.
This paper proposes an approach to detect information relevance during decision-making from eye movements in order to enable user interface adaptation. This is a challenging task because gaze behavior varies greatly across individual users and tasks and groundtruth data is difficult to obtain. Thus, prior work has mostly focused on simpler target-search tasks or on establishing general interest, where gaze behavior is less complex. From the literature, we identify six metrics that capture different aspects of the gaze behavior during decision-making and combine them in a voting scheme. We empirically show, that this accounts for the large variations in gaze behavior and out-performs standalone metrics. Importantly, it offers an intuitive way to control the amount of detected information, which is crucial for different UI adaptation schemes to succeed. We show the applicability of our approach by developing a room-search application that changes the visual saliency of content detected as relevant. In an empirical study, we show that it detects up to 97% of relevant elements with respect to user self-reporting, which allows us to meaningfully adapt the interface, as confirmed by participants. Our approach is fast, does not need any explicit user input and can be applied independent of task and user.