Antti Oulasvirta

h-index72

40papers

1,426citations

Novelty51%

AI Score56

Ranked #19,885 of 201,018 authors (top 10%)#21 in HC (top 1%)

40 Papers

HCApr 15, 2022

Investigating Positive and Negative Qualities of Human-in-the-Loop Optimization for Designing Interaction Techniques

Liwei Chan, Yi-Chi Liao, George B. Mo et al.

Designers reportedly struggle with design optimization tasks where they are asked to find a combination of design parameters that maximizes a given set of objectives. In HCI, design optimization problems are often exceedingly complex, involving multiple objectives and expensive empirical evaluations. Model-based computational design algorithms assist designers by generating design examples during design, however they assume a model of the interaction domain. Black box methods for assistance, on the other hand, can work with any design problem. However, virtually all empirical studies of this human-in-the-loop approach have been carried out by either researchers or end-users. The question stands out if such methods can help designers in realistic tasks. In this paper, we study Bayesian optimization as an algorithmic method to guide the design optimization process. It operates by proposing to a designer which design candidate to try next, given previous observations. We report observations from a comparative study with 40 novice designers who were tasked to optimize a complex 3D touch interaction technique. The optimizer helped designers explore larger proportions of the design space and arrive at a better solution, however they reported lower agency and expressiveness. Designers guided by an optimizer reported lower mental effort but also felt less creative and less in charge of the progress. We conclude that human-in-the-loop optimization can support novice designers in cases where agency is not critical.

HCApr 24Code

Point & Grasp: Flexible Selection of Out-of-Reach Objects Through Probabilistic Cue Integration

Xuejing Luo, Hee-Seung Moon, Christian Holz et al.

Selecting out-of-reach objects is a fundamental task in mixed reality (MR). Existing methods rely on a single cue or deterministically fuse multiple cues, leading to performance degradation when the dominant cue becomes unreliable. In this work, we introduce a probabilistic cue integration framework that enables flexible combination of multiple user-generated cues for intent inference. Inspired by natural grasping behavior, we instantiate the framework with pointing direction and grasp gestures as a new interaction technique, Point&Grasp. To this end, we collect the Out-of-Reach Grasping (ORG) dataset to train a robust likelihood model of the gestural cue, which captures grasping patterns not present in existing in-reach datasets. User studies demonstrate that our selection method with cue integration not only improves accuracy and speed over single-cue baselines, but also remains practically effective compared to state-of-the-art methods across various sources of ambiguity. The dataset and code are available at https://github.com/drlxj/point-and-grasp.

LGJan 27, 2023

Modeling human road crossing decisions as reward maximization with visual perception limitations

Yueyang Wang, Aravinda Ramakrishnan Srinivasan, Jussi P. P. Jokinen et al.

Understanding the interaction between different road users is critical for road safety and automated vehicles (AVs). Existing mathematical models on this topic have been proposed based mostly on either cognitive or machine learning (ML) approaches. However, current cognitive models are incapable of simulating road user trajectories in general scenarios, and ML models lack a focus on the mechanisms generating the behavior and take a high-level perspective which can cause failures to capture important human-like behaviors. Here, we develop a model of human pedestrian crossing decisions based on computational rationality, an approach using deep reinforcement learning (RL) to learn boundedly optimal behavior policies given human constraints, in our case a model of the limited human visual system. We show that the proposed combined cognitive-RL model captures human-like patterns of gap acceptance and crossing initiation time. Interestingly, our model's decisions are sensitive to not only the time gap, but also the speed of the approaching vehicle, something which has been described as a "bias" in human gap acceptance behavior. However, our results suggest that this is instead a rational adaption to human perceptual limitations. Moreover, we demonstrate an approach to accounting for individual differences in computational rationality models, by conditioning the RL policy on the parameters of the human constraints. Our results demonstrate the feasibility of generating more human-like road user behavior by combining RL with cognitive models.

HCFeb 2

Cost-Aware Bayesian Optimization for Prototyping Interactive Devices

Thomas Langerak, Renate Zhang, Ziyuan Wang et al.

Deciding which idea is worth prototyping is a central concern in iterative design. A prototype should be produced when the expected improvement is high and the cost is low. However, this is hard to decide, because costs can vary drastically: a simple parameter tweak may take seconds, while fabricating hardware consumes material and energy. Such asymmetries, can discourage a designer from exploring the design space. In this paper, we present an extension of cost-aware Bayesian optimization to account for diverse prototyping costs. The method builds on the power of Bayesian optimization and requires only a minimal modification to the acquisition function. The key idea is to use designer-estimated costs to guide sampling toward more cost-effective prototypes. In technical evaluations, the method achieved comparable utility to a cost-agnostic baseline while requiring only ${\approx}70\%$ of the cost; under strict budgets, it outperformed the baseline threefold. A within-subjects study with 12 participants in a realistic joystick design task demonstrated similar benefits. These results show that accounting for prototyping costs can make Bayesian optimization more compatible with real-world design projects.

LGJul 6, 2025Code

Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback

Jan Kompatscher, Danqing Shi, Giovanna Varni et al.

Reinforcement learning from human feedback (RLHF) has emerged as a key enabling technology for aligning AI behavior with human preferences. The traditional way to collect data in RLHF is via pairwise comparisons: human raters are asked to indicate which one of two samples they prefer. We present an interactive visualization that better exploits the human visual ability to compare and explore whole groups of samples. The interface is comprised of two linked views: 1) an exploration view showing a contextual overview of all sampled behaviors organized in a hierarchical clustering structure; and 2) a comparison view displaying two selected groups of behaviors for user queries. Users can efficiently explore large sets of behaviors by iterating between these two views. Additionally, we devised an active learning approach suggesting groups for comparison. As shown by our evaluation in six simulated robotics tasks, our approach increases the final policy returns by 69.34%. It leads to lower error rates and better policies. We open-source the code that can be easily integrated into the RLHF training loop, supporting research on human-AI alignment.

HCMar 6

Hierarchical Resource Rationality Explains Human Reading Behavior

Yunpeng Bai, Xiaofu Jin, Shengdong Zhao et al.

Reading is a pervasive and cognitively demanding activity that underpins modern human culture. It is a prime instance of a class of tasks where eye movements are coordinated for the purpose of comprehension. Existing theories explain either eye movements or comprehension during reading, but the critical link between the two remains unclear. Here, we propose resource-rational optimization as a unifying principle governing adaptive reading behavior. Eye movements are selected to maximize expected comprehension while minimizing cognitive and temporal costs, organized hierarchically across nested time scales: fixation decisions support word recognition; sentence-level integration guides skipping and regression; and text-level comprehension goals shape memory construction and rereading. A computational implementation successfully replicates an unprecedented range of findings in human reading, from lexical effects to comprehension outcomes. Together, these results suggest that resource rationality provides a general mechanism for coordinating perception, memory, and action in knowledge-intensive human behaviors, offering a principled account of how complex cognitive skills adapt to limited resources.

LGOct 25, 2024Code

AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design

Francisco Erivaldo Fernandes Junior, Antti Oulasvirta

Developing a reinforcement learning (RL) agent often involves identifying values for numerous parameters, covering the policy, reward function, environment, and agent-internal architecture. Since these parameters are interrelated in complex ways, optimizing them is a black-box problem that proves especially challenging for nonexperts. Although existing optimization-as-a-service platforms (e.g., Vizier and Optuna) can handle such problems, they are impractical for RL systems, since the need for manual user mapping of each parameter to distinct components makes the effort cumbersome. It also requires understanding of the optimization process, limiting the systems' application beyond the machine learning field and restricting access in areas such as cognitive science, which models human decision-making. To tackle these challenges, the paper presents AgentForge, a flexible low-code platform to optimize any parameter set across an RL system. Available at https://github.com/feferna/AgentForge, it allows an optimization problem to be defined in a few lines of code and handed to any of the interfaced optimizers. With AgentForge, the user can optimize the parameters either individually or jointly. The paper presents an evaluation of its performance for a challenging vision-based RL problem.

MMSep 16, 2018Code

Cloud Gaming With Foveated Graphics

Gazi Illahi, Thomas Van Gemert, Matti Siekkinen et al.

Cloud gaming enables playing high end games, originally designed for PC or game console setups, on low end devices, such as net-books and smartphones, by offloading graphics rendering to GPU powered cloud servers. However, transmitting the high end graphics requires a large amount of available network bandwidth, even though it is a compressed video stream. Foveated video encoding (FVE) reduces the bandwidth requirement by taking advantage of the non-uniform acuity of human visual system and by knowing where the user is looking. We have designed and implemented a system for cloud gaming with foveated graphics using a consumer grade real-time eye tracker and an open source cloud gaming platform. In this article, we describe the system and its evaluation through measurements with representative games from different genres to understand the effect of parameterization of the FVE scheme on bandwidth requirements and to understand its feasibility from the latency perspective. We also present results from a user study. The results suggest that it is possible to find a "sweet spot" for the encoding parameters so that the users hardly notice the presence of foveated encoding but at the same time the scheme yields most of the bandwidth savings achievable.

HCFeb 4

Adaptive Prompt Elicitation for Text-to-Image Generation

Xinyi Wen, Lena Hegemann, Xiaofu Jin et al.

Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.

HCMar 12

Modeling Trial-and-Error Navigation With a Sequential Decision Model of Information Scent

Xiaofu Jin, Yunpeng Bai, Antti Oulasvirta

Users often struggle to locate an item within an information architecture, particularly when links are ambiguous or deeply nested in hierarchies. Information scent has been used to explain why users select incorrect links, but this concept assumes that users see all available links before deciding. In practice, users frequently select a link too quickly, overlook relevant cues, and then rely on backtracking when errors occur. We extend the concept of information scent by framing navigation as a sequential decision-making problem under memory constraints. Specifically, we assume that users do not scan entire pages but instead inspect strategically, looking "just enough" to find the target given their time budget. To choose which item to inspect next, they consider both local (this page) and global (site) scent; however, both are constrained by memory. Trying to avoid wasting time, they occasionally choose the wrong links without inspecting everything on a page. Comparisons with empirical data show that our model replicates key navigation behaviors: premature selections, wrong turns, and recovery from backtracking. We conclude that trial-and-error behavior is well explained by information scent when accounting for the sequential and bounded characteristics of the navigation problem.

HCFeb 26

Simulation-based Optimization for Augmented Reading

Yunpeng Bai, Shengdong Zhao, Antti Oulasvirta

Augmented reading systems aim to adapt text presentation to improve comprehension and task performance, yet existing approaches rely heavily on heuristics, opaque data-driven models, or repeated human involvement in the design loop. We propose framing augmented reading as a simulation-based optimization problem grounded in resource-rational models of human reading. These models instantiate a simulated reader that allocates limited cognitive resources, such as attention, memory, and time under task demands, enabling systematic evaluation of text user interfaces. We introduce two complementary optimization pipelines: an offline approach that explores design alternatives using simulated readers, and an online approach that personalizes reading interfaces in real time using ongoing interaction data. Together, this perspective enables adaptive, explainable, and scalable augmented reading design without relying solely on human testing.

HCApr 21, 2024

Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

Yue Jiang, Changkong Zhou, Vikas Garg et al.

Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up. They do not encapsulate both semantic and visuo-spatial relationships among elements. To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture individual elements' properties and their semantic-visuo-spatial constraints in a layout. The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task, which involved predicting the positions of remaining unplaced elements in a partially completed GUI. The new model's suggestions showed alignment and visual appeal superior to the baseline method and received higher subjective ratings for preference. Furthermore, we demonstrate the practical benefits and efficiency advantages designers perceive when utilizing our model as an autocompletion plug-in.

CVApr 15, 2024

EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning

Yue Jiang, Zixin Guo, Hamed Rezazadegan Tavakoli et al.

From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which leverages a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that controls gaze locations. Our model has the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and duration, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model. Our software and models will be publicly available.

AIFeb 6, 2024

Pedestrian crossing decisions can be explained by bounded optimal decision-making under noisy visual perception

Yueyang Wang, Aravinda Ramakrishnan Srinivasan, Jussi P. P. Jokinen et al.

This paper presents a model of pedestrian crossing decisions, based on the theory of computational rationality. It is assumed that crossing decisions are boundedly optimal, with bounds on optimality arising from human cognitive limitations. While previous models of pedestrian behaviour have been either 'black-box' machine learning models or mechanistic models with explicit assumptions about cognitive factors, we combine both approaches. Specifically, we model mechanistically noisy human visual perception and assumed rewards in crossing, but we use reinforcement learning to learn bounded optimal behaviour policy. The model reproduces a larger number of known empirical phenomena than previous models, in particular: (1) the effect of the time to arrival of an approaching vehicle on whether the pedestrian accepts the gap, the effect of the vehicle's speed on both (2) gap acceptance and (3) pedestrian timing of crossing in front of yielding vehicles, and (4) the effect on this crossing timing of the stopping distance of the yielding vehicle. Notably, our findings suggest that behaviours previously framed as 'biases' in decision-making, such as speed-dependent gap acceptance, might instead be a product of rational adaptation to the constraints of visual perception. Our approach also permits fitting the parameters of cognitive constraints and rewards per individual, to better account for individual differences. To conclude, by leveraging both RL and mechanistic modelling, our model offers novel insights about pedestrian behaviour, and may provide a useful foundation for more accurate and scalable pedestrian models.

HCFeb 5, 2025

Controllable GUI Exploration

Aryan Garg, Yue Jiang, Antti Oulasvirta

During the early stages of interface design, designers need to produce multiple sketches to explore a design space. Design tools often fail to support this critical stage, because they insist on specifying more details than necessary. Although recent advances in generative AI have raised hopes of solving this issue, in practice they fail because expressing loose ideas in a prompt is impractical. In this paper, we propose a diffusion-based approach to the low-effort generation of interface sketches. It breaks new ground by allowing flexible control of the generation process via three types of inputs: A) prompts, B) wireframes, and C) visual flows. The designer can provide any combination of these as input at any level of detail, and will get a diverse gallery of low-fidelity solutions in response. The unique benefit is that large design spaces can be explored rapidly with very little effort in input-specification. We present qualitative results for various combinations of input specifications. Additionally, we demonstrate that our model aligns more accurately with these specifications than other models.

HCJan 28

Log2Motion: Biomechanical Motion Synthesis from Touch Logs

Michał Patryk Miazga, Hannah Bussmann, Antti Oulasvirta et al.

Touch data from mobile devices are collected at scale but reveal little about the interactions that produce them. While biomechanical simulations can illuminate motor control processes, they have not yet been developed for touch interactions. To close this gap, we propose a novel computational problem: synthesizing plausible motion directly from logs. Our key insight is a reinforcement learning-driven musculoskeletal forward simulation that generates biomechanically plausible motion sequences consistent with events recorded in touch logs. We achieve this by integrating a software emulator into a physics simulator, allowing biomechanical models to manipulate real applications in real-time. Log2Motion produces rich syntheses of user movements from touch logs, including estimates of motion, speed, accuracy, and effort. We assess the plausibility of generated movements by comparing against human data from a motion capture study and prior findings, and demonstrate Log2Motion in a large-scale dataset. Biomechanical motion synthesis provides a new way to understand log data, illuminating the ergonomics and motor control underlying touch interactions.

HCFeb 2

Simulating Human Audiovisual Search Behavior

Hyunsung Cho, Xuejing Luo, Byungjoo Lee et al.

Locating a target based on auditory and visual cues$\unicode{x2013}$such as finding a car in a crowded parking lot or identifying a speaker in a virtual meeting$\unicode{x2013}$requires balancing effort, time, and accuracy under uncertainty. Existing models of audiovisual search often treat perception and action in isolation, overlooking how people adaptively coordinate movement and sensory strategies. We present Sensonaut, a computational model of embodied audiovisual search. The core assumption is that people deploy their body and sensory systems in ways they believe will most efficiently improve their chances of locating a target, trading off time and effort under perceptual constraints. Our model formulates this as a resource-rational decision-making problem under partial observability. We validate the model against newly collected human data, showing that it reproduces both adaptive scaling of search time and effort under task complexity, occlusion, and distraction, and characteristic human errors. Our simulation of human-like resource-rational search informs the design of audiovisual interfaces that minimize search cost and cognitive load.

AISep 29, 2025

Interactive Program Synthesis for Modeling Collaborative Physical Activities from Narrated Demonstrations

Edward Kim, Daniel He, Jorge Chao et al.

Teaching systems physical tasks is a long standing goal in HCI, yet most prior work has focused on non collaborative physical activities. Collaborative tasks introduce added complexity, requiring systems to infer users assumptions about their teammates intent, which is an inherently ambiguous and dynamic process. This necessitates representations that are interpretable and correctable, enabling users to inspect and refine system behavior. We address this challenge by framing collaborative task learning as a program synthesis problem. Our system represents behavior as editable programs and uses narrated demonstrations, i.e. paired physical actions and natural language, as a unified modality for teaching, inspecting, and correcting system logic without requiring users to see or write code. The same modality is used for the system to communicate its learning to users. In a within subjects study, 20 users taught multiplayer soccer tactics to our system. 70 percent (14/20) of participants successfully refined learned programs to match their intent and 90 percent (18/20) found it easy to correct the programs. The study surfaced unique challenges in representing learning as programs and in enabling users to teach collaborative physical activities. We discuss these issues and outline mitigation strategies.

HCJul 24, 2025

DxHF: Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition

Danqing Shi, Furui Cheng, Tino Weinkauf et al.

Human preferences are widely used to align large language models (LLMs) through methods such as reinforcement learning from human feedback (RLHF). However, the current user interfaces require annotators to compare text paragraphs, which is cognitively challenging when the texts are long or unfamiliar. This paper contributes by studying the decomposition principle as an approach to improving the quality of human feedback for LLM alignment. This approach breaks down the text into individual claims instead of directly comparing two long-form text responses. Based on the principle, we build a novel user interface DxHF. It enhances the comparison process by showing decomposed claims, visually encoding the relevance of claims to the conversation and linking similar claims. This allows users to skim through key information and identify differences for better and quicker judgment. Our technical evaluation shows evidence that decomposition generally improves feedback accuracy regarding the ground truth, particularly for users with uncertainty. A crowdsourcing study with 160 participants indicates that using DxHF improves feedback accuracy by an average of 5%, although it increases the average feedback time by 18 seconds. Notably, accuracy is significantly higher in situations where users have less certainty. The finding of the study highlights the potential of HCI as an effective method for improving human-AI alignment.

HCDec 24, 2021

Rediscovering Affordance: A Reinforcement Learning Perspective

Yi-Chi Liao, Kashyap Todi, Aditya Acharya et al.

Affordance refers to the perception of possible actions allowed by an object. Despite its relevance to human-computer interaction, no existing theory explains the mechanisms that underpin affordance-formation; that is, how affordances are discovered and adapted via interaction. We propose an integrative theory of affordance-formation based on the theory of reinforcement learning in cognitive sciences. The key assumption is that users learn to associate promising motor actions to percepts via experience when reinforcement signals (success/failure) are present. They also learn to categorize actions (e.g., "rotating" a dial), giving them the ability to name and reason about affordance. Upon encountering novel widgets, their ability to generalize these actions determines their ability to perceive affordances. We implement this theory in a virtual robot model, which demonstrates human-like adaptation of affordance in interactive widgets tasks. While its predictions align with trends in human data, humans are able to adapt affordances faster, suggesting the existence of additional mechanisms.

HCJul 22, 2021

Toward AI Assistants That Let Designers Design

Sebastiaan De Peuter, Antti Oulasvirta, Samuel Kaski

AI for supporting designers needs to be rethought. It should aim to cooperate, not automate, by supporting and leveraging the creativity and problem-solving of designers. The challenge for such AI is how to infer designers' goals and then help them without being needlessly disruptive. We present AI-assisted design: a framework for creating such AI, built around generative user models which enable reasoning about designers' goals, reasoning, and capabilities.

HCMar 11, 2021

Adapting User Interfaces with Model-based Reinforcement Learning

Kashyap Todi, Gilles Bailly, Luis A. Leiva et al.

Adapting an interface requires taking into account both the positive and negative effects that changes may have on the user. A carelessly picked adaptation may impose high costs to the user -- for example, due to surprise or relearning effort -- or "trap" the process to a suboptimal design immaturely. However, effects on users are hard to predict as they depend on factors that are latent and evolve over the course of interaction. We propose a novel approach for adaptive user interfaces that yields a conservative adaptation policy: It finds beneficial changes when there are such and avoids changes when there are none. Our model-based reinforcement learning method plans sequences of adaptations and consults predictive HCI models to estimate their effects. We present empirical and simulation results from the case of adaptive menus, showing that the method outperforms both a non-adaptive and a frequency-based policy.

HCFeb 8, 2021

Improving Artificial Teachers by Considering How People Learn and Forget

Aurélien Nioche, Pierre-Alexandre Murena, Carlos de la Torre-Ortiz et al.

The paper presents a novel model-based method for intelligent tutoring, with particular emphasis on the problem of selecting teaching interventions in interaction with humans. Whereas previous work has focused on either personalization of teaching or optimization of teaching intervention sequences, the proposed individualized model-based planning approach represents convergence of these two lines of research. Model-based planning picks the best interventions via interactive learning of a user memory model's parameters. The approach is novel in its use of a cognitive model that can account for several key individual- and material-specific characteristics related to recall/forgetting, along with a planning technique that considers users' practice schedules. Taking a rule-based approach as a baseline, the authors evaluated the method's benefits in a controlled study of artificial teaching in second-language vocabulary learning (N=53).

HCJan 22, 2021

Understanding Visual Saliency in Mobile User Interfaces

Luis A. Leiva, Yunfei Xue, Avya Bansal et al.

For graphical user interface (UI) design, it is important to understand what attracts visual attention. While previous work on saliency has focused on desktop and web-based UIs, mobile app UIs differ from these in several respects. We present findings from a controlled study with 30 participants and 193 mobile UIs. The results speak to a role of expectations in guiding where users look at. Strong bias toward the top-left corner of the display, text, and images was evident, while bottom-up features such as color or size affected saliency less. Classic, parameter-free saliency models showed a weak fit with the data, and data-driven models improved significantly when trained specifically on this dataset (e.g., NSS rose from 0.66 to 0.84). We also release the first annotated dataset for investigating visual saliency in mobile UIs.

HCMay 4, 2020

Foraging-based Optimization of Menu Systems

Niraj Ramesh Dayama, Morteza Shiripour, Antti Oulasvirta et al.

Computational design of menu systems has been solved in limited cases such as the linear menu (list) as an assignment task, where commands are assigned to menu positions while optimizing for for users selection performance and distance of associated items. We show that this approach falls short with larger, hierarchically organized menu systems, where one must also take into account how users navigate hierarchical structures. This paper presents a novel integer programming formulation that models hierarchical menus as a combination of the exact set covering problem and the assignment problem. It organizes commands into ordered groups of ordered groups via a novel objective function based on information foraging theory. It minimizes, on the one hand, the time required to select a command whose location is known from previous usage and, on the other, the time wasted in irrelevant parts of the menu while searching for commands whose location is not known. The convergence of these two factors yields usable, well-ordered command hierarchies from a single model. In generated menus, the lead (first) elements of a group or tab are good indicators of the remaining contents, thereby facilitating the search process. In a controlled usability evaluation, the performance of computationally designed menus was 25 faster than existing commercial designs with respect to selection time. The algorithm is efficient for large, representative instances of the problem. We further show applications in personalization and adaptation of menu systems.

HCMay 4, 2020

Human Strategic Steering Improves Performance of Interactive Optimization

Fabio Colella, Pedram Daee, Jussi Jokinen et al.

A central concern in an interactive intelligent system is optimization of its actions, to be maximally helpful to its human user. In recommender systems for instance, the action is to choose what to recommend, and the optimization task is to recommend items the user prefers. The optimization is done based on earlier user's feedback (e.g. "likes" and "dislikes"), and the algorithms assume the feedback to be faithful. That is, when the user clicks "like," they actually prefer the item. We argue that this fundamental assumption can be extensively violated by human users, who are not passive feedback sources. Instead, they are in control, actively steering the system towards their goal. To verify this hypothesis, that humans steer and are able to improve performance by steering, we designed a function optimization task where a human and an optimization algorithm collaborate to find the maximum of a 1-dimensional function. At each iteration, the optimization algorithm queries the user for the value of a hidden function $f$ at a point $x$, and the user, who sees the hidden function, provides an answer about $f(x)$. Our study on 21 participants shows that users who understand how the optimization works, strategically provide biased answers (answers not equal to $f(x)$), which results in the algorithm finding the optimum significantly faster. Our work highlights that next-generation intelligent systems will need user models capable of helping users who steer systems to pursue their goals.

HCFeb 26, 2020

Press'Em: Simulating Varying Button Tactility via FDVV Models

Yi-Chi Liao, Sunjun Kim, Byungjoo Lee et al.

Push-buttons provide rich haptic feedback during a press via mechanical structures. While different buttons have varying haptic qualities, few works have attempted to dynamically render such tactility, which limits designers from freely exploring buttons' haptic design. We extend the typical force-displacement (FD) model with vibration (V) and velocity-dependence characteristics (V) to form a novel FDVV model. We then introduce Press'Em, a 3D-printed prototype capable of simulating button tactility based on FDVV models. To drive Press'Em, an end-to-end simulation pipeline is presented that covers (1) capturing any physical buttons, (2) controlling the actuation signals, and (3) simulating the tactility. Our system can go beyond replicating existing buttons to enable designers to emulate and test non-existent ones with desired haptic properties. Press'Em aims to be a tool for future research to better understand and iterate over button designs.

HCJan 13, 2020

Button Simulation and Design via FDVV Models

Yi-Chi Liao, Sunjun Kim, Byungjoo Lee et al.

Designing a push-button with desired sensation and performance is challenging because the mechanical construction must have the right response characteristics. Physical simulation of a button's force-displacement (FD) response has been studied to facilitate prototyping; however, the simulations' scope and realism have been limited. In this paper, we extend FD modeling to include vibration (V) and velocity-dependence characteristics (V). The resulting FDVV models better capture tactility characteristics of buttons, including snap. They increase the range of simulated buttons and the perceived realism relative to FD models. The paper also demonstrates methods for obtaining these models, editing them, and simulating accordingly. This end-to-end approach enables the analysis, prototyping, and optimization of buttons, and supports exploring designs that would be hard to implement mechanically.

HCJan 10, 2020

Optimal Sensor Position for a Computer Mouse

Sunjun Kim, Byungjoo Lee, Thomas van Gemert et al.

Computer mice have their displacement sensors in various locations (center, front, and rear). However, there has been little research into the effects of sensor position or on engineering approaches to exploit it. This paper first discusses the mechanisms via which sensor position affects mouse movement and reports the results from a study of a pointing task in which the sensor position was systematically varied. Placing the sensor in the center turned out to be the best compromise: improvements over front and rear were in the 11--14% range for throughput and 20--23% for path deviation. However, users varied in their personal optima. Accordingly, variable-sensor-position mice are then presented, with a demonstration that high accuracy can be achieved with two static optical sensors. A virtual sensor model is described that allows software-side repositioning of the sensor. Individual-specific calibration should yield an added 4% improvement in throughput over the default center position.

HCJan 9, 2020

GRIDS: Interactive Layout Design with Integer Programming

Niraj Dayama, Kashyap Todi, Taru Saarelainen et al.

Grid layouts are used by designers to spatially organise user interfaces when sketching and wireframing. However, their design is largely time consuming manual work. This is challenging due to combinatorial explosion and complex objectives, such as alignment, balance, and expectations regarding positions. This paper proposes a novel optimisation approach for the generation of diverse grid-based layouts. Our mixed integer linear programming (MILP) model offers a rigorous yet efficient method for grid generation that ensures packing, alignment, grouping, and preferential positioning of elements. Further, we present techniques for interactive diversification, enhancement, and completion of grid layouts (Figure 1). These capabilities are demonstrated using GRIDS1, a wireframing tool that provides designers with real-time layout suggestions. We report findings from a ratings study (N = 13) and a design study (N = 16), lending evidence for the benefit of computational grid generation during early stages of design.

AIJan 4, 2020

Hierarchical Reinforcement Learning as a Model of Human Task Interleaving

Christoph Gebhardt, Antti Oulasvirta, Otmar Hilliges

How do people decide how long to continue in a task, when to switch, and to which other task? Understanding the mechanisms that underpin task interleaving is a long-standing goal in the cognitive sciences. Prior work suggests greedy heuristics and a policy maximizing the marginal rate of return. However, it is unclear how such a strategy would allow for adaptation to everyday environments that offer multiple tasks with complex switch costs and delayed rewards. Here we develop a hierarchical model of supervisory control driven by reinforcement learning (RL). The supervisory level learns to switch using task-specific approximate utility estimates, which are computed on the lower level. A hierarchically optimal value function decomposition can be learned from experience, even in conditions with multiple tasks and arbitrary and uncertain reward and cost structures. The model reproduces known empirical effects of task interleaving. It yields better predictions of individual-level data than a myopic baseline in a six-task problem (N=211). The results support hierarchical RL as a plausible model of task interleaving.

HCJan 24, 2019

SAM: A Modular Framework for Self-Adapting Web Menus

Camille Gobert, Kashyap Todi, Gilles Bailly et al.

This paper presents SAM, a modular and extensible JavaScript framework for self-adapting menus on webpages. SAM allows control of two elementary aspects for adapting web menus: (1) the target policy, which assigns scores to menu items for adaptation, and (2) the adaptation style, which specifies how they are adapted on display. By decoupling them, SAM enables the exploration of different combinations independently. Several policies from literature are readily implemented, and paired with adaptation styles such as reordering and highlighting. The process - including user data logging - is local, offering privacy benefits and eliminating the need for server-side modifications. Researchers can use SAM to experiment adaptation policies and styles, and benchmark techniques in an ecological setting with real webpages. Practitioners can make websites self-adapting, and end-users can dynamically personalise typically static web menus.

HCMar 3, 2018

AdaM: Adapting Multi-User Interfaces for Collaborative Environments in Real-Time

Seonwook Park, Christoph Gebhardt, Roman Rädle et al.

Developing cross-device multi-user interfaces (UIs) is a challenging problem. There are numerous ways in which content and interactivity can be distributed. However, good solutions must consider multiple users, their roles, their preferences and access rights, as well as device capabilities. Manual and rule-based solutions are tedious to create and do not scale to larger problems nor do they adapt to dynamic changes, such as users leaving or joining an activity. In this paper, we cast the problem of UI distribution as an assignment problem and propose to solve it using combinatorial optimization. We present a mixed integer programming formulation which allows real-time applications in dynamically changing collaborative settings. It optimizes the allocation of UI elements based on device capabilities, user roles, preferences, and access rights. We present a proof-of-concept designer-in-the-loop tool, allowing for quick solution exploration. Finally, we compare our approach to traditional paper prototyping in a lab study.

HCDec 2, 2016

Inferring Cognitive Models from Data using Approximate Bayesian Computation

Antti Kangasrääsiö, Kumaripaba Athukorala, Andrew Howes et al.

An important problem for HCI researchers is to estimate the parameter values of a cognitive model from behavioral data. This is a difficult problem, because of the substantial complexity and variety in human behavioral strategies. We report an investigation into a new approach using approximate Bayesian computation (ABC) to condition model parameters to data and prior knowledge. As the case study we examine menu interaction, where we have click time data only to infer a cognitive model that implements a search behaviour with parameters such as fixation duration and recall probability. Our results demonstrate that ABC (i) improves estimates of model parameter values, (ii) enables meaningful comparisons between model variants, and (iii) supports fitting models to individual users. ABC provides ample opportunities for theoretical HCI research by allowing principled inference of model parameter values and their uncertainty.

HCNov 24, 2016

AutoGain: Gain Function Adaptation with Submovement Efficiency Optimization

Byungjoo Lee, Mathieu Nancel, Sunjun Kim et al.

A well-designed control-to-display gain function can improve pointing performance with indirect pointing devices like trackpads. However, the design of gain functions is challenging and mostly based on trial and error. AutoGain is a novel method to individualize a gain function for indirect pointing devices in contexts where cursor trajectories can be tracked. It gradually improves pointing efficiency by using a novel submovement-level tracking+optimization technique that minimizes aiming error (undershooting/overshooting) for each submovement. We first show that AutoGain can produce, from scratch, gain functions with performance comparable to commercial designs, in less than a half-hour of active use. Second, we demonstrate AutoGain's applicability to emerging input devices (here, a Leap Motion controller) with no reference gain functions. Third, a one-month longitudinal study of normal computer use with AutoGain showed performance improvements from participants' default functions.

CVOct 16, 2016

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

Srinath Sridhar, Franziska Mueller, Michael Zollhöfer et al.

Real-time simultaneous tracking of hands manipulating and interacting with external objects has many potential applications in augmented reality, tangible computing, and wearable computing. However, due to difficult occlusions, fast motions, and uniform hand appearance, jointly tracking hand and object pose is more challenging than tracking either of the two separately. Many previous approaches resort to complex multi-camera setups to remedy the occlusion problem and often employ expensive segmentation and optimization steps which makes real-time tracking impossible. In this paper, we propose a real-time solution that uses a single commodity RGB-D camera. The core of our approach is a 3D articulated Gaussian mixture alignment strategy tailored to hand-object tracking that allows fast pose optimization. The alignment energy uses novel regularizers to address occlusions and hand-object contacts. For added robustness, we guide the optimization with discriminative part classification of the hand and segmentation of the object. We conducted extensive experiments on several existing datasets and introduce a new annotated hand-object dataset. Quantitative and qualitative results show the key advantages of our method: speed, accuracy, and robustness.

CVFeb 12, 2016

Fast and Robust Hand Tracking Using Detection-Guided Optimization

Srinath Sridhar, Franziska Mueller, Antti Oulasvirta et al.

Markerless tracking of hands and fingers is a promising enabler for human-computer interaction. However, adoption has been limited because of tracking inaccuracies, incomplete coverage of motions, low framerate, complex camera setups, and high computational requirements. In this paper, we present a fast method for accurately tracking rapid and complex articulations of the hand using a single depth camera. Our algorithm uses a novel detection-guided optimization strategy that increases the robustness and speed of pose estimation. In the detection step, a randomized decision forest classifies pixels into parts of the hand. In the optimization step, a novel objective function combines the detected part labels and a Gaussian mixture representation of the depth to estimate a pose that best fits the depth. Our approach needs comparably less computational resources which makes it extremely fast (50 fps without GPU support). The approach also supports varying static, or moving, camera-to-scene arrangements. We show the benefits of our method by evaluating on public datasets and comparing against previous work.

CVFeb 11, 2016

Real-Time Hand Tracking Using a Sum of Anisotropic Gaussians Model

Srinath Sridhar, Helge Rhodin, Hans-Peter Seidel et al.

Real-time marker-less hand tracking is of increasing importance in human-computer interaction. Robust and accurate tracking of arbitrary hand motion is a challenging problem due to the many degrees of freedom, frequent self-occlusions, fast motions, and uniform skin color. In this paper, we propose a new approach that tracks the full skeleton motion of the hand from multiple RGB cameras in real-time. The main contributions include a new generative tracking method which employs an implicit hand shape representation based on Sum of Anisotropic Gaussians (SAG), and a pose fitting energy that is smooth and analytically differentiable making fast gradient based pose optimization possible. This shape representation, together with a full perspective projection model, enables more accurate hand modeling than a related baseline method from literature. Our method achieves better accuracy than previous methods and runs at 25 fps. We show these improvements both qualitatively and quantitatively on publicly available datasets.

CRMar 8, 2014

Text Entry Method Affects Password Security

Yulong Yang, Janne Lindqvist, Antti Oulasvirta

Text-based passwords continue to be the prime form of authentication to computer systems. Today, they are increasingly created and used with mobile text entry methods, such as touchscreens and mobile keyboards, in addition to traditional physical keyboards. This raises a foundational question for usable security: whether text entry methods affect password generation and password security. This paper presents results from a between-group study with 63 participants, in which each group generated passwords for multiple virtual accounts using a different text entry method. Participants were also asked to recall their passwords afterwards. We applied analysis of structures and probabilities, with standard and recent security metrics and also performed cracking attacks on the collected data. The results show a significant effect of text entry methods on passwords. In particular, one of the experimental groups created passwords with significantly more lowercase letters per password than the control group ($t(60) = 2.99, p = 0.004$). The choices for character types in each group were also significantly different ($p=0.048, FET$). Our cracking attacks consequently expose significantly different resistance across groups ($p=0.031, FET$) and text entry method vulnerabilities. Our findings contribute to the understanding of password security in the context of usable interfaces.

CRJan 2, 2014

User-Generated Free-Form Gestures for Authentication: Security and Memorability

Michael Sherman, Gradeigh Clark, Yulong Yang et al.

This paper studies the security and memorability of free-form multitouch gestures for mobile authentication. Towards this end, we collected a dataset with a generate-test-retest paradigm where participants (N=63) generated free-form gestures, repeated them, and were later retested for memory. Half of the participants decided to generate one-finger gestures, and the other half generated multi-finger gestures. Although there has been recent work on template-based gestures, there are yet no metrics to analyze security of either template or free-form gestures. For example, entropy-based metrics used for text-based passwords are not suitable for capturing the security and memorability of free-form gestures. Hence, we modify a recently proposed metric for analyzing information capacity of continuous full-body movements for this purpose. Our metric computed estimated mutual information in repeated sets of gestures. Surprisingly, one-finger gestures had higher average mutual information. Gestures with many hard angles and turns had the highest mutual information. The best-remembered gestures included signatures and simple angular shapes. We also implemented a multitouch recognizer to evaluate the practicality of free-form gestures in a real authentication system and how they perform against shoulder surfing attacks. We conclude the paper with strategies for generating secure and memorable free-form gestures, which present a robust method for mobile authentication.