Antti Oulasvirta

h-index65

24papers

1,015citations

Novelty49%

AI Score45

Ranked #41,648 of 194,257 authors (top 21%)#213 in HC (top 8%)

24 Papers

12.4HCApr 15, 2022

Investigating Positive and Negative Qualities of Human-in-the-Loop Optimization for Designing Interaction Techniques

Liwei Chan, Yi-Chi Liao, George B. Mo et al.

Designers reportedly struggle with design optimization tasks where they are asked to find a combination of design parameters that maximizes a given set of objectives. In HCI, design optimization problems are often exceedingly complex, involving multiple objectives and expensive empirical evaluations. Model-based computational design algorithms assist designers by generating design examples during design, however they assume a model of the interaction domain. Black box methods for assistance, on the other hand, can work with any design problem. However, virtually all empirical studies of this human-in-the-loop approach have been carried out by either researchers or end-users. The question stands out if such methods can help designers in realistic tasks. In this paper, we study Bayesian optimization as an algorithmic method to guide the design optimization process. It operates by proposing to a designer which design candidate to try next, given previous observations. We report observations from a comparative study with 40 novice designers who were tasked to optimize a complex 3D touch interaction technique. The optimizer helped designers explore larger proportions of the design space and arrive at a better solution, however they reported lower agency and expressiveness. Designers guided by an optimizer reported lower mental effort but also felt less creative and less in charge of the progress. We conclude that human-in-the-loop optimization can support novice designers in cases where agency is not critical.

5.3LGJan 27, 2023

Modeling human road crossing decisions as reward maximization with visual perception limitations

Yueyang Wang, Aravinda Ramakrishnan Srinivasan, Jussi P. P. Jokinen et al.

Understanding the interaction between different road users is critical for road safety and automated vehicles (AVs). Existing mathematical models on this topic have been proposed based mostly on either cognitive or machine learning (ML) approaches. However, current cognitive models are incapable of simulating road user trajectories in general scenarios, and ML models lack a focus on the mechanisms generating the behavior and take a high-level perspective which can cause failures to capture important human-like behaviors. Here, we develop a model of human pedestrian crossing decisions based on computational rationality, an approach using deep reinforcement learning (RL) to learn boundedly optimal behavior policies given human constraints, in our case a model of the limited human visual system. We show that the proposed combined cognitive-RL model captures human-like patterns of gap acceptance and crossing initiation time. Interestingly, our model's decisions are sensitive to not only the time gap, but also the speed of the approaching vehicle, something which has been described as a "bias" in human gap acceptance behavior. However, our results suggest that this is instead a rational adaption to human perceptual limitations. Moreover, we demonstrate an approach to accounting for individual differences in computational rationality models, by conditioning the RL policy on the parameters of the human constraints. Our results demonstrate the feasibility of generating more human-like road user behavior by combining RL with cognitive models.

11.4LGJul 6, 2025Code

Interactive Groupwise Comparison for Reinforcement Learning from Human Feedback

Jan Kompatscher, Danqing Shi, Giovanna Varni et al.

Reinforcement learning from human feedback (RLHF) has emerged as a key enabling technology for aligning AI behavior with human preferences. The traditional way to collect data in RLHF is via pairwise comparisons: human raters are asked to indicate which one of two samples they prefer. We present an interactive visualization that better exploits the human visual ability to compare and explore whole groups of samples. The interface is comprised of two linked views: 1) an exploration view showing a contextual overview of all sampled behaviors organized in a hierarchical clustering structure; and 2) a comparison view displaying two selected groups of behaviors for user queries. Users can efficiently explore large sets of behaviors by iterating between these two views. Additionally, we devised an active learning approach suggesting groups for comparison. As shown by our evaluation in six simulated robotics tasks, our approach increases the final policy returns by 69.34%. It leads to lower error rates and better policies. We open-source the code that can be easily integrated into the RLHF training loop, supporting research on human-AI alignment.

4.6LGOct 25, 2024Code

AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design

Francisco Erivaldo Fernandes Junior, Antti Oulasvirta

Developing a reinforcement learning (RL) agent often involves identifying values for numerous parameters, covering the policy, reward function, environment, and agent-internal architecture. Since these parameters are interrelated in complex ways, optimizing them is a black-box problem that proves especially challenging for nonexperts. Although existing optimization-as-a-service platforms (e.g., Vizier and Optuna) can handle such problems, they are impractical for RL systems, since the need for manual user mapping of each parameter to distinct components makes the effort cumbersome. It also requires understanding of the optimization process, limiting the systems' application beyond the machine learning field and restricting access in areas such as cognitive science, which models human decision-making. To tackle these challenges, the paper presents AgentForge, a flexible low-code platform to optimize any parameter set across an RL system. Available at https://github.com/feferna/AgentForge, it allows an optimization problem to be defined in a few lines of code and handed to any of the interfaced optimizers. With AgentForge, the user can optimize the parameters either individually or jointly. The paper presents an evaluation of its performance for a challenging vision-based RL problem.

1.2MMSep 16, 2018Code

Cloud Gaming With Foveated Graphics

Gazi Illahi, Thomas Van Gemert, Matti Siekkinen et al.

Cloud gaming enables playing high end games, originally designed for PC or game console setups, on low end devices, such as net-books and smartphones, by offloading graphics rendering to GPU powered cloud servers. However, transmitting the high end graphics requires a large amount of available network bandwidth, even though it is a compressed video stream. Foveated video encoding (FVE) reduces the bandwidth requirement by taking advantage of the non-uniform acuity of human visual system and by knowing where the user is looking. We have designed and implemented a system for cloud gaming with foveated graphics using a consumer grade real-time eye tracker and an open source cloud gaming platform. In this article, we describe the system and its evaluation through measurements with representative games from different genres to understand the effect of parameterization of the FVE scheme on bandwidth requirements and to understand its feasibility from the latency perspective. We also present results from a user study. The results suggest that it is possible to find a "sweet spot" for the encoding parameters so that the users hardly notice the presence of foveated encoding but at the same time the scheme yields most of the bandwidth savings achievable.

13.9HCApr 21, 2024

Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

Yue Jiang, Changkong Zhou, Vikas Garg et al.

Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up. They do not encapsulate both semantic and visuo-spatial relationships among elements. To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture individual elements' properties and their semantic-visuo-spatial constraints in a layout. The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task, which involved predicting the positions of remaining unplaced elements in a partially completed GUI. The new model's suggestions showed alignment and visual appeal superior to the baseline method and received higher subjective ratings for preference. Furthermore, we demonstrate the practical benefits and efficiency advantages designers perceive when utilizing our model as an autocompletion plug-in.

11.3CVApr 15, 2024

EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning

Yue Jiang, Zixin Guo, Hamed Rezazadegan Tavakoli et al.

From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which leverages a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that controls gaze locations. Our model has the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and duration, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model. Our software and models will be publicly available.

7.3AIFeb 6, 2024

Pedestrian crossing decisions can be explained by bounded optimal decision-making under noisy visual perception

Yueyang Wang, Aravinda Ramakrishnan Srinivasan, Jussi P. P. Jokinen et al.

This paper presents a model of pedestrian crossing decisions, based on the theory of computational rationality. It is assumed that crossing decisions are boundedly optimal, with bounds on optimality arising from human cognitive limitations. While previous models of pedestrian behaviour have been either 'black-box' machine learning models or mechanistic models with explicit assumptions about cognitive factors, we combine both approaches. Specifically, we model mechanistically noisy human visual perception and assumed rewards in crossing, but we use reinforcement learning to learn bounded optimal behaviour policy. The model reproduces a larger number of known empirical phenomena than previous models, in particular: (1) the effect of the time to arrival of an approaching vehicle on whether the pedestrian accepts the gap, the effect of the vehicle's speed on both (2) gap acceptance and (3) pedestrian timing of crossing in front of yielding vehicles, and (4) the effect on this crossing timing of the stopping distance of the yielding vehicle. Notably, our findings suggest that behaviours previously framed as 'biases' in decision-making, such as speed-dependent gap acceptance, might instead be a product of rational adaptation to the constraints of visual perception. Our approach also permits fitting the parameters of cognitive constraints and rewards per individual, to better account for individual differences. To conclude, by leveraging both RL and mechanistic modelling, our model offers novel insights about pedestrian behaviour, and may provide a useful foundation for more accurate and scalable pedestrian models.

11.5HCFeb 5, 2025

Controllable GUI Exploration

Aryan Garg, Yue Jiang, Antti Oulasvirta

During the early stages of interface design, designers need to produce multiple sketches to explore a design space. Design tools often fail to support this critical stage, because they insist on specifying more details than necessary. Although recent advances in generative AI have raised hopes of solving this issue, in practice they fail because expressing loose ideas in a prompt is impractical. In this paper, we propose a diffusion-based approach to the low-effort generation of interface sketches. It breaks new ground by allowing flexible control of the generation process via three types of inputs: A) prompts, B) wireframes, and C) visual flows. The designer can provide any combination of these as input at any level of detail, and will get a diverse gallery of low-fidelity solutions in response. The unique benefit is that large design spaces can be explored rapidly with very little effort in input-specification. We present qualitative results for various combinations of input specifications. Additionally, we demonstrate that our model aligns more accurately with these specifications than other models.

5.8AISep 29, 2025

Interactive Program Synthesis for Modeling Collaborative Physical Activities from Narrated Demonstrations

Edward Kim, Daniel He, Jorge Chao et al.

Teaching systems physical tasks is a long standing goal in HCI, yet most prior work has focused on non collaborative physical activities. Collaborative tasks introduce added complexity, requiring systems to infer users assumptions about their teammates intent, which is an inherently ambiguous and dynamic process. This necessitates representations that are interpretable and correctable, enabling users to inspect and refine system behavior. We address this challenge by framing collaborative task learning as a program synthesis problem. Our system represents behavior as editable programs and uses narrated demonstrations, i.e. paired physical actions and natural language, as a unified modality for teaching, inspecting, and correcting system logic without requiring users to see or write code. The same modality is used for the system to communicate its learning to users. In a within subjects study, 20 users taught multiplayer soccer tactics to our system. 70 percent (14/20) of participants successfully refined learned programs to match their intent and 90 percent (18/20) found it easy to correct the programs. The study surfaced unique challenges in representing learning as programs and in enabling users to teach collaborative physical activities. We discuss these issues and outline mitigation strategies.

9.5HCJul 24, 2025

DxHF: Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition

Danqing Shi, Furui Cheng, Tino Weinkauf et al.

Human preferences are widely used to align large language models (LLMs) through methods such as reinforcement learning from human feedback (RLHF). However, the current user interfaces require annotators to compare text paragraphs, which is cognitively challenging when the texts are long or unfamiliar. This paper contributes by studying the decomposition principle as an approach to improving the quality of human feedback for LLM alignment. This approach breaks down the text into individual claims instead of directly comparing two long-form text responses. Based on the principle, we build a novel user interface DxHF. It enhances the comparison process by showing decomposed claims, visually encoding the relevance of claims to the conversation and linking similar claims. This allows users to skim through key information and identify differences for better and quicker judgment. Our technical evaluation shows evidence that decomposition generally improves feedback accuracy regarding the ground truth, particularly for users with uncertainty. A crowdsourcing study with 160 participants indicates that using DxHF improves feedback accuracy by an average of 5%, although it increases the average feedback time by 18 seconds. Notably, accuracy is significantly higher in situations where users have less certainty. The finding of the study highlights the potential of HCI as an effective method for improving human-AI alignment.

12.0HCDec 24, 2021

Rediscovering Affordance: A Reinforcement Learning Perspective

Yi-Chi Liao, Kashyap Todi, Aditya Acharya et al.

Affordance refers to the perception of possible actions allowed by an object. Despite its relevance to human-computer interaction, no existing theory explains the mechanisms that underpin affordance-formation; that is, how affordances are discovered and adapted via interaction. We propose an integrative theory of affordance-formation based on the theory of reinforcement learning in cognitive sciences. The key assumption is that users learn to associate promising motor actions to percepts via experience when reinforcement signals (success/failure) are present. They also learn to categorize actions (e.g., "rotating" a dial), giving them the ability to name and reason about affordance. Upon encountering novel widgets, their ability to generalize these actions determines their ability to perceive affordances. We implement this theory in a virtual robot model, which demonstrates human-like adaptation of affordance in interactive widgets tasks. While its predictions align with trends in human data, humans are able to adapt affordances faster, suggesting the existence of additional mechanisms.

12.0HCJul 22, 2021

Toward AI Assistants That Let Designers Design

Sebastiaan De Peuter, Antti Oulasvirta, Samuel Kaski

AI for supporting designers needs to be rethought. It should aim to cooperate, not automate, by supporting and leveraging the creativity and problem-solving of designers. The challenge for such AI is how to infer designers' goals and then help them without being needlessly disruptive. We present AI-assisted design: a framework for creating such AI, built around generative user models which enable reasoning about designers' goals, reasoning, and capabilities.

18.5HCJan 22, 2021

Understanding Visual Saliency in Mobile User Interfaces

Luis A. Leiva, Yunfei Xue, Avya Bansal et al.

For graphical user interface (UI) design, it is important to understand what attracts visual attention. While previous work on saliency has focused on desktop and web-based UIs, mobile app UIs differ from these in several respects. We present findings from a controlled study with 30 participants and 193 mobile UIs. The results speak to a role of expectations in guiding where users look at. Strong bias toward the top-left corner of the display, text, and images was evident, while bottom-up features such as color or size affected saliency less. Classic, parameter-free saliency models showed a weak fit with the data, and data-driven models improved significantly when trained specifically on this dataset (e.g., NSS rose from 0.66 to 0.84). We also release the first annotated dataset for investigating visual saliency in mobile UIs.

5.8HCMay 4, 2020

Foraging-based Optimization of Menu Systems

Niraj Ramesh Dayama, Morteza Shiripour, Antti Oulasvirta et al.

Computational design of menu systems has been solved in limited cases such as the linear menu (list) as an assignment task, where commands are assigned to menu positions while optimizing for for users selection performance and distance of associated items. We show that this approach falls short with larger, hierarchically organized menu systems, where one must also take into account how users navigate hierarchical structures. This paper presents a novel integer programming formulation that models hierarchical menus as a combination of the exact set covering problem and the assignment problem. It organizes commands into ordered groups of ordered groups via a novel objective function based on information foraging theory. It minimizes, on the one hand, the time required to select a command whose location is known from previous usage and, on the other, the time wasted in irrelevant parts of the menu while searching for commands whose location is not known. The convergence of these two factors yields usable, well-ordered command hierarchies from a single model. In generated menus, the lead (first) elements of a group or tab are good indicators of the remaining contents, thereby facilitating the search process. In a controlled usability evaluation, the performance of computationally designed menus was 25 faster than existing commercial designs with respect to selection time. The algorithm is efficient for large, representative instances of the problem. We further show applications in personalization and adaptation of menu systems.

9.6HCJan 13, 2020

Button Simulation and Design via FDVV Models

Yi-Chi Liao, Sunjun Kim, Byungjoo Lee et al.

Designing a push-button with desired sensation and performance is challenging because the mechanical construction must have the right response characteristics. Physical simulation of a button's force-displacement (FD) response has been studied to facilitate prototyping; however, the simulations' scope and realism have been limited. In this paper, we extend FD modeling to include vibration (V) and velocity-dependence characteristics (V). The resulting FDVV models better capture tactility characteristics of buttons, including snap. They increase the range of simulated buttons and the perceived realism relative to FD models. The paper also demonstrates methods for obtaining these models, editing them, and simulating accordingly. This end-to-end approach enables the analysis, prototyping, and optimization of buttons, and supports exploring designs that would be hard to implement mechanically.

8.4AIJan 4, 2020

Hierarchical Reinforcement Learning as a Model of Human Task Interleaving

Christoph Gebhardt, Antti Oulasvirta, Otmar Hilliges

How do people decide how long to continue in a task, when to switch, and to which other task? Understanding the mechanisms that underpin task interleaving is a long-standing goal in the cognitive sciences. Prior work suggests greedy heuristics and a policy maximizing the marginal rate of return. However, it is unclear how such a strategy would allow for adaptation to everyday environments that offer multiple tasks with complex switch costs and delayed rewards. Here we develop a hierarchical model of supervisory control driven by reinforcement learning (RL). The supervisory level learns to switch using task-specific approximate utility estimates, which are computed on the lower level. A hierarchically optimal value function decomposition can be learned from experience, even in conditions with multiple tasks and arbitrary and uncertain reward and cost structures. The model reproduces known empirical effects of task interleaving. It yields better predictions of individual-level data than a myopic baseline in a six-task problem (N=211). The results support hierarchical RL as a plausible model of task interleaving.

9.3HCJan 24, 2019Code

SAM: A Modular Framework for Self-Adapting Web Menus

Camille Gobert, Kashyap Todi, Gilles Bailly et al.

This paper presents SAM, a modular and extensible JavaScript framework for self-adapting menus on webpages. SAM allows control of two elementary aspects for adapting web menus: (1) the target policy, which assigns scores to menu items for adaptation, and (2) the adaptation style, which specifies how they are adapted on display. By decoupling them, SAM enables the exploration of different combinations independently. Several policies from literature are readily implemented, and paired with adaptation styles such as reordering and highlighting. The process - including user data logging - is local, offering privacy benefits and eliminating the need for server-side modifications. Researchers can use SAM to experiment adaptation policies and styles, and benchmark techniques in an ecological setting with real webpages. Practitioners can make websites self-adapting, and end-users can dynamically personalise typically static web menus.

12.9HCDec 2, 2016

Inferring Cognitive Models from Data using Approximate Bayesian Computation

Antti Kangasrääsiö, Kumaripaba Athukorala, Andrew Howes et al.

An important problem for HCI researchers is to estimate the parameter values of a cognitive model from behavioral data. This is a difficult problem, because of the substantial complexity and variety in human behavioral strategies. We report an investigation into a new approach using approximate Bayesian computation (ABC) to condition model parameters to data and prior knowledge. As the case study we examine menu interaction, where we have click time data only to infer a cognitive model that implements a search behaviour with parameters such as fixation duration and recall probability. Our results demonstrate that ABC (i) improves estimates of model parameter values, (ii) enables meaningful comparisons between model variants, and (iii) supports fitting models to individual users. ABC provides ample opportunities for theoretical HCI research by allowing principled inference of model parameter values and their uncertainty.

14.1HCNov 24, 2016

AutoGain: Gain Function Adaptation with Submovement Efficiency Optimization

Byungjoo Lee, Mathieu Nancel, Sunjun Kim et al.

A well-designed control-to-display gain function can improve pointing performance with indirect pointing devices like trackpads. However, the design of gain functions is challenging and mostly based on trial and error. AutoGain is a novel method to individualize a gain function for indirect pointing devices in contexts where cursor trajectories can be tracked. It gradually improves pointing efficiency by using a novel submovement-level tracking+optimization technique that minimizes aiming error (undershooting/overshooting) for each submovement. We first show that AutoGain can produce, from scratch, gain functions with performance comparable to commercial designs, in less than a half-hour of active use. Second, we demonstrate AutoGain's applicability to emerging input devices (here, a Leap Motion controller) with no reference gain functions. Third, a one-month longitudinal study of normal computer use with AutoGain showed performance improvements from participants' default functions.

22.4CVOct 16, 2016

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

Srinath Sridhar, Franziska Mueller, Michael Zollhöfer et al.

Real-time simultaneous tracking of hands manipulating and interacting with external objects has many potential applications in augmented reality, tangible computing, and wearable computing. However, due to difficult occlusions, fast motions, and uniform hand appearance, jointly tracking hand and object pose is more challenging than tracking either of the two separately. Many previous approaches resort to complex multi-camera setups to remedy the occlusion problem and often employ expensive segmentation and optimization steps which makes real-time tracking impossible. In this paper, we propose a real-time solution that uses a single commodity RGB-D camera. The core of our approach is a 3D articulated Gaussian mixture alignment strategy tailored to hand-object tracking that allows fast pose optimization. The alignment energy uses novel regularizers to address occlusions and hand-object contacts. For added robustness, we guide the optimization with discriminative part classification of the hand and segmentation of the object. We conducted extensive experiments on several existing datasets and introduce a new annotated hand-object dataset. Quantitative and qualitative results show the key advantages of our method: speed, accuracy, and robustness.

23.5CVFeb 12, 2016

Fast and Robust Hand Tracking Using Detection-Guided Optimization

Srinath Sridhar, Franziska Mueller, Antti Oulasvirta et al.

Markerless tracking of hands and fingers is a promising enabler for human-computer interaction. However, adoption has been limited because of tracking inaccuracies, incomplete coverage of motions, low framerate, complex camera setups, and high computational requirements. In this paper, we present a fast method for accurately tracking rapid and complex articulations of the hand using a single depth camera. Our algorithm uses a novel detection-guided optimization strategy that increases the robustness and speed of pose estimation. In the detection step, a randomized decision forest classifies pixels into parts of the hand. In the optimization step, a novel objective function combines the detected part labels and a Gaussian mixture representation of the depth to estimate a pose that best fits the depth. Our approach needs comparably less computational resources which makes it extremely fast (50 fps without GPU support). The approach also supports varying static, or moving, camera-to-scene arrangements. We show the benefits of our method by evaluating on public datasets and comparing against previous work.

12.1CRMar 8, 2014

Text Entry Method Affects Password Security

Yulong Yang, Janne Lindqvist, Antti Oulasvirta

Text-based passwords continue to be the prime form of authentication to computer systems. Today, they are increasingly created and used with mobile text entry methods, such as touchscreens and mobile keyboards, in addition to traditional physical keyboards. This raises a foundational question for usable security: whether text entry methods affect password generation and password security. This paper presents results from a between-group study with 63 participants, in which each group generated passwords for multiple virtual accounts using a different text entry method. Participants were also asked to recall their passwords afterwards. We applied analysis of structures and probabilities, with standard and recent security metrics and also performed cracking attacks on the collected data. The results show a significant effect of text entry methods on passwords. In particular, one of the experimental groups created passwords with significantly more lowercase letters per password than the control group ($t(60) = 2.99, p = 0.004$). The choices for character types in each group were also significantly different ($p=0.048, FET$). Our cracking attacks consequently expose significantly different resistance across groups ($p=0.031, FET$) and text entry method vulnerabilities. Our findings contribute to the understanding of password security in the context of usable interfaces.

23.3CRJan 2, 2014

User-Generated Free-Form Gestures for Authentication: Security and Memorability

Michael Sherman, Gradeigh Clark, Yulong Yang et al.

This paper studies the security and memorability of free-form multitouch gestures for mobile authentication. Towards this end, we collected a dataset with a generate-test-retest paradigm where participants (N=63) generated free-form gestures, repeated them, and were later retested for memory. Half of the participants decided to generate one-finger gestures, and the other half generated multi-finger gestures. Although there has been recent work on template-based gestures, there are yet no metrics to analyze security of either template or free-form gestures. For example, entropy-based metrics used for text-based passwords are not suitable for capturing the security and memorability of free-form gestures. Hence, we modify a recently proposed metric for analyzing information capacity of continuous full-body movements for this purpose. Our metric computed estimated mutual information in repeated sets of gestures. Surprisingly, one-finger gestures had higher average mutual information. Gestures with many hard angles and turns had the highest mutual information. The best-remembered gestures included signatures and simple angular shapes. We also implemented a multitouch recognizer to evaluate the practicality of free-form gestures in a real authentication system and how they perform against shoulder surfing attacks. We conclude the paper with strategies for generating secure and memorable free-form gestures, which present a robust method for mobile authentication.