CVMay 31Code
Towards Interactive Video World Modeling: Frontiers, Challenges, Benchmarks, and Future TrendsJiuming Liu, Chaojun Ni, Mengmeng Liu et al.
With rapid development of large language models and diffusion-based content generation, world modeling has attracted increasing research attention, benefiting various downstream domains such as game engines, embodied AI, autonomous driving, etc. Through explicitly incorporating user actions into world state transition, recent literature empowers world modeling with interactivity in an action-conditioned video or 3D generation paradigm, further enhancing controllability over world evolutions and facilitating users to freely traverse, manipulate, navigate, and personalize the state evolution. In this paper, we aim to systematically review recent research trends, technical developments, evaluation benchmarks, and also propose future potential directions in interactive world modeling. Specifically, we first summarize recent efforts and trends in terms of application scenarios, world state evolution, and scene modality. Afterwards, we delve into three crucial technical challenges, including action-conditioned controllability, long-horizon interactions and memory, and action-following responsiveness for real-time interactivity. Furthermore, we also thoroughly compare existing benchmarks and metrics in four specific application fields: open-world exploration, game engine, autonomous driving, and robotics. Finally, we discuss several promising future directions in achieving next-generation interactive world modeling. The corresponding repository is publicly available at: https://github.com/liujiuming123/Awesome-Interactive-World-Model.
HCApr 15, 2022
Investigating Positive and Negative Qualities of Human-in-the-Loop Optimization for Designing Interaction TechniquesLiwei Chan, Yi-Chi Liao, George B. Mo et al.
Designers reportedly struggle with design optimization tasks where they are asked to find a combination of design parameters that maximizes a given set of objectives. In HCI, design optimization problems are often exceedingly complex, involving multiple objectives and expensive empirical evaluations. Model-based computational design algorithms assist designers by generating design examples during design, however they assume a model of the interaction domain. Black box methods for assistance, on the other hand, can work with any design problem. However, virtually all empirical studies of this human-in-the-loop approach have been carried out by either researchers or end-users. The question stands out if such methods can help designers in realistic tasks. In this paper, we study Bayesian optimization as an algorithmic method to guide the design optimization process. It operates by proposing to a designer which design candidate to try next, given previous observations. We report observations from a comparative study with 40 novice designers who were tasked to optimize a complex 3D touch interaction technique. The optimizer helped designers explore larger proportions of the design space and arrive at a better solution, however they reported lower agency and expressiveness. Designers guided by an optimizer reported lower mental effort but also felt less creative and less in charge of the progress. We conclude that human-in-the-loop optimization can support novice designers in cases where agency is not critical.
CVAug 10, 2023
Encode-Store-Retrieve: Augmenting Human Memory through Language-Encoded Egocentric PerceptionJunxiao Shen, John Dudley, Per Ola Kristensson
We depend on our own memory to encode, store, and retrieve our experiences. However, memory lapses can occur. One promising avenue for achieving memory augmentation is through the use of augmented reality head-mounted displays to capture and preserve egocentric videos, a practice commonly referred to as lifelogging. However, a significant challenge arises from the sheer volume of video data generated through lifelogging, as the current technology lacks the capability to encode and store such large amounts of data efficiently. Further, retrieving specific information from extensive video archives requires substantial computational power, further complicating the task of quickly accessing desired content. To address these challenges, we propose a memory augmentation agent that involves leveraging natural language encoding for video data and storing them in a vector database. This approach harnesses the power of large vision language models to perform the language encoding process. Additionally, we propose using large language models to facilitate natural language querying. Our agent underwent extensive evaluation using the QA-Ego4D dataset and achieved state-of-the-art results with a BLEU score of 8.3, outperforming conventional machine learning models that scored between 3.4 and 5.8. Additionally, we conducted a user study in which participants interacted with the human memory augmentation agent through episodic memory and open-ended questions. The results of this study show that the agent results in significantly better recall performance on episodic memory tasks compared to human participants. The results also highlight the agent's practical applicability and user acceptance.
CLOct 12, 2023
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry TechniquesJunxiao Shen, John J. Dudley, Jingyao Zheng et al.
Text entry is an essential task in our day-to-day digital interactions. Numerous intelligent features have been developed to streamline this process, making text entry more effective, efficient, and fluid. These improvements include sentence prediction and user personalization. However, as deep learning-based language models become the norm for these advanced features, the necessity for data collection and model fine-tuning increases. These challenges can be mitigated by harnessing the in-context learning capability of large language models such as GPT-3.5. This unique feature allows the language model to acquire new skills through prompts, eliminating the need for data collection and fine-tuning. Consequently, large language models can learn various text prediction techniques. We initially showed that, for a sentence prediction task, merely prompting GPT-3.5 surpassed a GPT-2 backed system and is comparable with a fine-tuned GPT-3.5 model, with the latter two methods requiring costly data collection, fine-tuning and post-processing. However, the task of prompting large language models to specialize in specific text prediction tasks can be challenging, particularly for designers without expertise in prompt engineering. To address this, we introduce Promptor, a conversational prompt generation agent designed to engage proactively with designers. Promptor can automatically generate complex prompts tailored to meet specific needs, thus offering a solution to this challenge. We conducted a user study involving 24 participants creating prompts for three intelligent text entry tasks, half of the participants used Promptor while the other half designed prompts themselves. The results show that Promptor-designed prompts result in a 35% increase in similarity and 22% in coherence over those by designers.
HCFeb 2
Cost-Aware Bayesian Optimization for Prototyping Interactive DevicesThomas Langerak, Renate Zhang, Ziyuan Wang et al.
Deciding which idea is worth prototyping is a central concern in iterative design. A prototype should be produced when the expected improvement is high and the cost is low. However, this is hard to decide, because costs can vary drastically: a simple parameter tweak may take seconds, while fabricating hardware consumes material and energy. Such asymmetries, can discourage a designer from exploring the design space. In this paper, we present an extension of cost-aware Bayesian optimization to account for diverse prototyping costs. The method builds on the power of Bayesian optimization and requires only a minimal modification to the acquisition function. The key idea is to use designer-estimated costs to guide sampling toward more cost-effective prototypes. In technical evaluations, the method achieved comparable utility to a cost-agnostic baseline while requiring only ${\approx}70\%$ of the cost; under strict budgets, it outperformed the baseline threefold. A within-subjects study with 12 participants in a realistic joystick design task demonstrated similar benefits. These results show that accounting for prototyping costs can make Bayesian optimization more compatible with real-world design projects.
HCMar 30
Unbounded: Object-Boundary Interaction in Mixed RealityZhuoyue Lyu, Per Ola Kristensson
Boundaries such as walls, windows, and doors are ubiquitous in the physical world, yet their potential in mixed reality (MR) remains underexplored. We present Unbounded, a Research through Design inquiry into object--boundary interaction (OBI). Building on prior work, we articulate a design space aimed at providing a shared language for OBI. To demonstrate its potential, we design and implement eight examples across productivity and art exploration scenarios, showcasing how OBIs can enrich and reframe everyday interactions. We further engage with six MR experts in one-on-one feedback sessions, using the design space and examples as design probes. Their reflections broaden the conceptual scope of OBI, reveal new possibilities for how the framework may be applied, and highlight implications for future MR interaction design. https://www.zhuoyuelyu.com/unbounded
HCMar 30
Objestures: Everyday Objects Meet Mid-Air Gestures for Expressive InteractionZhuoyue Lyu, Per Ola Kristensson
Everyday object-based interactions (EOIs) and mid-air gesture interactions (MAIs) have been widely explored, yet prior work on their integration often targets narrow use cases or specific technologies, leaving designers and developers with limited guidance that generalizes across diverse EOIs and MAIs. We introduce Objestures ("Obj" + "Gestures") -- five interaction types spanning EOIs and MAIs, forming a design space for expressive uni- and bimanual interaction. To evaluate the usefulness of Objestures, we conducted an exploratory user study (N=12) on basic 3D tasks (rotation and scaling), which showed performance comparable to the headset's native freehand manipulation. To understand the user experience, we conducted case studies with the same participants across three applications (Sound, Draw, and Shadow), where participants found the interactions intuitive, engaging, and expressive, and indicated interest in everyday use. We further demonstrate the potential of Objestures across diverse contexts through 30 examples, and discuss limitations and implications. https://www.zhuoyuelyu.com/objestures
HCDec 10, 2025
ImageTalk: Designing a Multimodal AAC Text Generation System Driven by Image Recognition and Natural Language GenerationBoyin Yang, Puming Jiang, Per Ola Kristensson
People living with Motor Neuron Disease (plwMND) frequently encounter speech and motor impairments that necessitate a reliance on augmentative and alternative communication (AAC) systems. This paper tackles the main challenge that traditional symbol-based AAC systems offer a limited vocabulary, while text entry solutions tend to exhibit low communication rates. To help plwMND articulate their needs about the system efficiently and effectively, we iteratively design and develop a novel multimodal text generation system called ImageTalk through a tailored proxy-user-based and an end-user-based design phase. The system demonstrates pronounced keystroke savings of 95.6%, coupled with consistent performance and high user satisfaction. We distill three design guidelines for AI-assisted text generation systems design and outline four user requirement levels tailored for AAC purposes, guiding future research in this field.
HCFeb 16
MyoInteract: A Framework for Fast Prototyping of Biomechanical HCI Tasks using Reinforcement LearningAnkit Bhattarai, Hannah Selder, Florian Fischer et al.
Reinforcement learning (RL)-based biomechanical simulations have the potential to revolutionise HCI research and interaction design, but currently lack usability and interpretability. Using the Human Action Cycle as a design lens, we identify key limitations of biomechanical RL frameworks and develop MyoInteract, a novel framework for fast prototyping of biomechanical HCI tasks. MyoInteract allows designers to setup tasks, user models, and training parameters from an easy-to-use GUI within minutes. It trains and evaluates muscle-actuated simulated users within minutes, reducing training times by up to 98%. A workshop study with 12 interaction designers revealed that MyoInteract allowed novices in biomechanical RL to successfully setup, train, and assess goal-directed user movements within a single session. By transforming biomechanical RL from a days-long expert task into an accessible hour-long workflow, this work significantly lowers barriers to entry and accelerates iteration cycles in HCI biomechanics research.
AINov 1, 2024
Human-inspired Perspectives: A Survey on AI Long-term MemoryZihong He, Weizhe Lin, Hao Zheng et al.
With the rapid advancement of AI systems, their abilities to store, retrieve, and utilize information over the long term - referred to as long-term memory - have become increasingly significant. These capabilities are crucial for enhancing the performance of AI systems across a wide range of tasks. However, there is currently no comprehensive survey that systematically investigates AI's long-term memory capabilities, formulates a theoretical framework, and inspires the development of next-generation AI long-term memory systems. This paper begins by introducing the mechanisms of human long-term memory, then explores AI long-term memory mechanisms, establishing a mapping between the two. Based on the mapping relationships identified, we extend the current cognitive architectures and propose the Cognitive Architecture of Self-Adaptive Long-term Memory (SALM). SALM provides a theoretical framework for the practice of AI long-term memory and holds potential for guiding the creation of next-generation long-term memory driven AI systems. Finally, we delve into the future directions and application prospects of AI long-term memory.
HCOct 29, 2024
Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D ScenesJunlong Chen, Jens Grubert, Per Ola Kristensson
As more applications of large language models (LLMs) for 3D content for immersive environments emerge, it is crucial to study user behaviour to identify interaction patterns and potential barriers to guide the future design of immersive content creation and editing systems which involve LLMs. In an empirical user study with 12 participants, we combine quantitative usage data with post-experience questionnaire feedback to reveal common interaction patterns and key barriers in LLM-assisted 3D scene editing systems. We identify opportunities for improving natural language interfaces in 3D design tools and propose design recommendations for future LLM-integrated 3D content creation systems. Through an empirical study, we demonstrate that LLM-assisted interactive systems can be used productively in immersive environments.
CVJan 12, 2025
X-LeBench: A Benchmark for Extremely Long Egocentric Video UnderstandingWenqi Zhou, Kai Cao, Hao Zheng et al.
Long-form egocentric video understanding provides rich contextual information and unique insights into long-term human behaviors, holding significant potential for applications in embodied intelligence, long-term activity analysis, and personalized assistive technologies. However, existing benchmark datasets primarily focus on single, short (\eg, minutes to tens of minutes) to moderately long videos, leaving a substantial gap in evaluating extensive, ultra-long egocentric video recordings. To address this, we introduce X-LeBench, a novel benchmark dataset meticulously designed to fill this gap by focusing on tasks requiring a comprehensive understanding of extremely long egocentric video recordings. Our X-LeBench develops a life-logging simulation pipeline that produces realistic, coherent daily plans aligned with real-world video data. This approach enables the flexible integration of synthetic daily plans with real-world footage from Ego4D-a massive-scale egocentric video dataset covers a wide range of daily life scenarios-resulting in 432 simulated video life logs spanning from 23 minutes to 16.4 hours. The evaluations of several baseline systems and multimodal large language models (MLLMs) reveal their poor performance across the board, highlighting the inherent challenges of long-form egocentric video understanding, such as temporal localization and reasoning, context aggregation, and memory retention, and underscoring the need for more advanced models.
HCOct 31, 2024
Generative AI for Accessible and Inclusive Extended RealityJens Grubert, Junlong Chen, Per Ola Kristensson
Artificial Intelligence-Generated Content (AIGC) has the potential to transform how people build and interact with virtual environments. Within this paper, we discuss potential benefits but also challenges that AIGC has for the creation of inclusive and accessible virtual environments. Specifically, we touch upon the decreased need for 3D modeling expertise, benefits of symbolic-only as well as multimodal input, 3D content editing, and 3D model accessibility as well as foundation model-specific challenges.
HCOct 28, 2024
Large Language Model-assisted Speech and Pointing Benefits Multiple 3D Object Selection in Virtual RealityJunlong Chen, Jens Grubert, Per Ola Kristensson
Selection of occluded objects is a challenging problem in virtual reality, even more so if multiple objects are involved. With the advent of new artificial intelligence technologies, we explore the possibility of leveraging large language models to assist multi-object selection tasks in virtual reality via a multimodal speech and raycast interaction technique. We validate the findings in a comparative user study (n=24), where participants selected target objects in a virtual reality scene with different levels of scene perplexity. The performance metrics and user experience metrics are compared against a mini-map based occluded object selection technique that serves as the baseline. Results indicate that the introduced technique, AssistVR, outperforms the baseline technique when there are multiple target objects. Contrary to the common belief for speech interfaces, AssistVR was able to outperform the baseline even when the target objects were difficult to reference verbally. This work demonstrates the viability and interaction potential of an intelligent multimodal interactive system powered by large laguage models. Based on the results, we discuss the implications for design of future intelligent multimodal interactive systems in immersive environments.
LGOct 3, 2025
The Argument is the Explanation: Structured Argumentation for Trust in AgentsEge Cakar, Per Ola Kristensson
Humans are black boxes -- we cannot observe their neural processes, yet society functions by evaluating verifiable arguments. AI explainability should follow this principle: stakeholders need verifiable reasoning chains, not mechanistic transparency. We propose using structured argumentation to provide a level of explanation and verification neither interpretability nor LLM-generated explanation is able to offer. Our pipeline achieves state-of-the-art 94.44 macro F1 on the AAEC published train/test split (5.7 points above prior work) and $0.81$ macro F1, $\sim$0.07 above previous published results with comparable data setups, for Argumentative MicroTexts relation classification, converting LLM text into argument graphs and enabling verification at each inferential step. We demonstrate this idea on multi-agent risk assessment using the Structured What-If Technique, where specialized agents collaborate transparently to carry out risk assessment otherwise achieved by humans alone. Using Bipolar Assumption-Based Argumentation, we capture support/attack relationships, thereby enabling automatic hallucination detection via fact nodes attacking arguments. We also provide a verification mechanism that enables iterative refinement through test-time feedback without retraining. For easy deployment, we provide a Docker container for the fine-tuned AMT model, and the rest of the code with the Bipolar ABA Python package on GitHub.
AISep 20, 2025
Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven MediaZihan Ding, Xinyi Wang, Junlong Chen et al.
Creators struggle to edit long-form, narrative-rich videos not because of UI complexity, but due to the cognitive demands of searching, storyboarding, and sequencing hours of footage. Existing transcript- or embedding-based methods fall short for creative workflows, as models struggle to track characters, infer motivations, and connect dispersed events. We present a prompt-driven, modular editing system that helps creators restructure multi-hour content through free-form prompts rather than timelines. At its core is a semantic indexing pipeline that builds a global narrative via temporal segmentation, guided memory compression, and cross-granularity fusion, producing interpretable traces of plot, dialogue, emotion, and context. Users receive cinematic edits while optionally refining transparent intermediate outputs. Evaluated on 400+ videos with expert ratings, QA, and preference studies, our system scales prompt-driven editing, preserves narrative coherence, and balances automation with creator control.
CVJan 20, 2024
Towards Open-World Gesture RecognitionJunxiao Shen, Matthias De Lange, Xuhai "Orson" Xu et al.
Providing users with accurate gestural interfaces, such as gesture recognition based on wrist-worn devices, is a key challenge in mixed reality. However, static machine learning processes in gesture recognition assume that training and test data come from the same underlying distribution. Unfortunately, in real-world applications involving gesture recognition, such as gesture recognition based on wrist-worn devices, the data distribution may change over time. We formulate this problem of adapting recognition models to new tasks, where new data patterns emerge, as open-world gesture recognition (OWGR). We propose the use of continual learning to enable machine learning models to be adaptive to new tasks without degrading performance on previously learned tasks. However, the process of exploring parameters for questions around when, and how, to train and deploy recognition models requires resource-intensive user studies may be impractical. To address this challenge, we propose a design engineering approach that enables offline analysis on a collected large-scale dataset by systematically examining various parameters and comparing different continual learning methods. Finally, we provide design guidelines to enhance the development of an open-world wrist-worn gesture recognition process.
HCJan 17, 2022
PoVRPoint: Authoring Presentations in Mobile Virtual RealityVerena Biener, Travis Gesslein, Daniel Schneider et al.
Virtual Reality (VR) has the potential to support mobile knowledge workers by complementing traditional input devices with a large three-dimensional output space and spatial input. Previous research on supporting VR knowledge work explored domains such as text entry using physical keyboards and spreadsheet interaction using combined pen and touch input. Inspired by such work, this paper probes the VR design space for authoring presentations in mobile settings. We propose PoVRPoint -- a set of tools coupling pen- and touch-based editing of presentations on mobile devices, such as tablets, with the interaction capabilities afforded by VR. We study the utility of extended display space to, for example, assist users in identifying target slides, supporting spatial manipulation of objects on a slide, creating animations, and facilitating arrangements of multiple, possibly occluded, shapes. Among other things, our results indicate that 1) the wide field of view afforded by VR results in significantly faster target slide identification times compared to a tablet-only interface for visually salient targets; and 2) the three-dimensional view in VR enables significantly faster object reordering in the presence of occlusion compared to two baseline interfaces. A user study further confirmed that the interaction techniques were found to be usable and enjoyable.
HCNov 6, 2021
Extended Reality for Knowledge Work in Everyday EnvironmentsVerena Biener, Eyal Ofek, Michel Pahud et al.
Virtual and Augmented Reality have the potential to change information work. The ability to modify the workers senses can transform everyday environments into a productive office, using portable head-mounted displays combined with conventional interaction devices, such as keyboards and tablets. While a stream of better, cheaper and lighter HMDs have been introduced for consumers in recent years, there are still many challenges to be addressed to allow this vision to become reality. This chapter summarizes the state of the art in the field of extended reality for knowledge work in everyday environments and proposes steps to address the open challenges.
HCSep 22, 2021
Accuracy Evaluation of Touch Tasks in Commodity Virtual and Augmented Reality Head-Mounted DisplaysDaniel Schneider, Verena Biener, Alexander Otte et al.
An increasing number of consumer-oriented head-mounted displays (HMD) for augmented and virtual reality (AR/VR) are capable of finger and hand tracking. We report on the accuracy of off-the-shelf VR and AR HMDs when used for touch-based tasks such as pointing or drawing. Specifically, we report on the finger tracking accuracy of the VR head-mounted displays Oculus Quest, Vive Pro and the Leap Motion controller, when attached to a VR HMD, as well as the finger tracking accuracy of the AR head-mounted displays Microsoft HoloLens 2 and Magic Leap. We present the results of two experiments in which we compare the accuracy for absolute and relative pointing tasks using both human participants and a robot. The results suggest that HTC Vive has a lower spatial accuracy than the Oculus Quest and Leap Motion and that the Microsoft HoloLens 2 provides higher spatial accuracy than Magic Leap One. These findings can serve as decision support for researchers and practitioners in choosing which systems to use in the future.
HCJul 28, 2021
Jarvis for Aeroengine Analytics: A Speech Enhanced Virtual Reality Demonstrator Based on Mining Knowledge DatabasesSławomir Konrad Tadeja, Krzysztof Kutt, Yupu Lu et al.
In this paper, we present a Virtual Reality (VR) based environment where the engineer interacts with incoming data from a fleet of aeroengines. This data takes the form of 3D computer-aided design (CAD) engine models coupled with characteristic plots for the subsystems of each engine. Both the plots and models can be interacted with and manipulated using speech or gestural input. The characteristic data is ported to a knowledge-based system underpinned by a knowledge-graph storing complex domain knowledge. This permits the system to respond to queries about the current state and health of each aeroengine asset. Responses to these questions require some degree of analysis, which is handled by a semantic knowledge representation layer managing information on aeroengine subsystems. This paper represents a significant step forward for aeroengine analysis in a bespoke VR environment and brings us a step closer to a Jarvis-like system for aeroengine analytics.
CVMay 27, 2021
The Imaginative Generative Adversarial Network: Automatic Data Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action RecognitionJunxiao Shen, John Dudley, Per Ola Kristensson
Deep learning approaches deliver state-of-the-art performance in recognition of spatiotemporal human motion data. However, one of the main challenges in these recognition tasks is limited available training data. Insufficient training data results in over-fitting and data augmentation is one approach to address this challenge. Existing data augmentation strategies based on scaling, shifting and interpolating offer limited generalizability and typically require detailed inspection of the dataset as well as hundreds of GPU hours for hyperparameter optimization. In this paper, we present a novel automatic data augmentation model, the Imaginative Generative Adversarial Network (GAN), that approximates the distribution of the input data and samples new data from this distribution. It is automatic in that it requires no data inspection and little hyperparameter tuning and therefore it is a low-cost and low-effort approach to generate synthetic data. We demonstrate our approach on small-scale skeleton-based datasets with a comprehensive experimental analysis. Our results show that the augmentation strategy is fast to train and can improve classification accuracy for both conventional neural networks and state-of-the-art methods.
HCSep 7, 2020
Towards a Practical Virtual Office for Mobile Knowledge WorkersEyal Ofek, Jens Grubert, Michel Pahud et al.
As more people work from home or during travel, new opportunities and challenges arise around mobile office work. On one hand, people may work at flexible hours, independent of traffic limitations, but on the other hand, they may need to work at makeshift spaces, with less than optimal working conditions and decoupled from co-workers. Virtual Reality (VR) has the potential to change the way information workers work: it enables personal bespoke working environments even on the go and allows new collaboration approaches that can help mitigate the effects of physical distance. In this paper, we investigate opportunities and challenges for realizing a mobile VR offices environments and discuss implications from recent findings of mixing standard off-the-shelf equipment, such as tablets, laptops or desktops, with VR to enable effective, efficient, ergonomic, and rewarding mobile knowledge work. Further, we investigate the role of conceptual and physical spaces in a mobile VR office.
HCSep 7, 2020
Back to the Future: Revisiting Mouse and Keyboard Interaction for HMD-based Immersive AnalyticsJens Grubert, Eyal Ofek, Michel Pahud et al.
With the rise of natural user interfaces, immersive analytics applications often focus on novel forms of interaction modalities such as mid-air gestures, gaze or tangible interaction utilizing input devices such as depth-sensors, touch screens and eye-trackers. At the same time, traditional input devices such as the physical keyboard and mouse are used to a lesser extent. We argue, that for certain work scenarios, such as conducting analytic tasks at stationary desktop settings, it can be valuable to combine the benefits of novel and established input devices as well as input modalities to create productive immersive analytics environments.
HCAug 11, 2020
Breaking the Screen: Interaction Across Touchscreen Boundaries in Virtual Reality for Mobile Knowledge WorkersVerena Biener, Daniel Schneider, Travis Gesslein et al.
Virtual Reality (VR) has the potential to transform knowledge work. One advantage of VR knowledge work is that it allows extending 2D displays into the third dimension, enabling new operations, such as selecting overlapping objects or displaying additional layers of information. On the other hand, mobile knowledge workers often work on established mobile devices, such as tablets, limiting interaction with those devices to a small input space. This challenge of a constrained input space is intensified in situations when VR knowledge work is situated in cramped environments, such as airplanes and touchdown spaces. In this paper, we investigate the feasibility of interacting jointly between an immersive VR head-mounted display and a tablet within the context of knowledge work. Specifically, we 1) design, implement and study how to interact with information that reaches beyond a single physical touchscreen in VR; 2) design and evaluate a set of interaction concepts; and 3) build example applications and gather user feedback on those applications.
HCAug 11, 2020
Pen-based Interaction with Spreadsheets in Mobile Virtual RealityTravis Gesslein, Verena Biener, Philipp Gagel et al.
Virtual Reality (VR) can enhance the display and interaction of mobile knowledge work and in particular, spreadsheet applications. While spreadsheets are widely used yet are challenging to interact with, especially on mobile devices, using them in VR has not been explored in depth. A special uniqueness of the domain is the contrast between the immersive and large display space afforded by VR, contrasted by the very limited interaction space that may be afforded for the information worker on the go, such as an airplane seat or a small work-space. To close this gap, we present a tool-set for enhancing spreadsheet interaction on tablets using immersive VR headsets and pen-based input. This combination opens up many possibilities for enhancing the productivity for spreadsheet interaction. We propose to use the space around and in front of the tablet for enhanced visualization of spreadsheet data and meta-data. For example, extending sheet display beyond the bounds of the physical screen, or easier debugging by uncovering hidden dependencies between sheet's cells. Combining the precise on-screen input of a pen with spatial sensing around the tablet, we propose tools for the efficient creation and editing of spreadsheets functions such as off-the-screen layered menus, visualization of sheets dependencies, and gaze-and-touch-based switching between spreadsheet tabs. We study the feasibility of the proposed tool-set using a video-based online survey and an expert-based assessment of indicative human performance potential.
HCNov 22, 2019
PhotoTwinVR: An Immersive System for Manipulation, Inspection and Dimension Measurements of the 3D Photogrammetric Models of Real-Life Structures in Virtual RealitySlawomir Konrad Tadeja, Wojciech Rydlewicz, Yupu Lu et al.
Photogrammetry is a science dealing with obtaining reliable information about physical objects using their imagery description. Recent advancements in the development of Virtual Reality (VR) can help to unlock the full potential offered by the digital 3D-reality models generated using the state-of-art photogrammetric technologies. These models are becoming a viable alternative for providing high-quality content for such immersive environment. Simultaneously, their analyses in VR could bring added-value to professionals working in various engineering and non-engineering settings and help in extracting useful information about physical objects. However, there is little research published to date on feasible interaction methods in the VR-based systems augmented with the 3D photogrammetric models, especially concerning gestural input interfaces. Consequently, this paper presents the PhotoTwinVR -- an immersive, gesture-controlled system for manipulation and inspection of 3D photogrammetric models of physical objects in VR. Our system allows the user to perform basic engineering operations on the model subjected to the off-line inspection process. An observational study with a group of three domain-expert participants was completed to verify its feasibility. The system was populated with a 3D photogrammetric model of an existing pipe-rack generated using a commercial software package. The participants were asked to carry out a survey measurement of the object using the measurement toolbox offered by PhotoTwinVR. The study revealed a potential of such immersive tool to be applied in practical real-words cases of off-line inspections of pipelines.
HCOct 22, 2019
AeroVR: Immersive Visualization System for Aerospace DesignSlawomir Konrad Tadeja, Pranay Seshadri, Per Ola Kristensson
One of today's most propitious immersive technologies is virtual reality (VR). This term is colloquially associated with headsets that transport users to a bespoke, built-for-purpose immersive 3D virtual environment. It has given rise to the field of immersive analytics---a new field of research that aims to use immersive technologies for enhancing and empowering data analytics. However, in developing such a new set of tools, one has to ask whether the move from standard hardware setup to a fully immersive 3D environment is justified---both in terms of efficiency and development costs. To this end, in this paper, we present the AeroVR--an immersive aerospace design environment with the objective of aiding the component aerodynamic design process by interactively visualizing performance and geometry. We decompose the design of such an environment into function structures, identify the primary and secondary tasks, present an implementation of the system, and verify the interface in terms of usability and expressiveness. We deploy AeroVR on a prototypical design study of a compressor blade for an engine.
HCSep 6, 2019
Effects of Depth Layer Switching between an Optical See-Through Head-Mounted Display and a Body-Proximate DisplayAnna Eiberger, Per Ola Kristensson, Susanne Mayr et al.
Optical see-through head-mounted displays (OST HMDs) typically display virtual content at a fixed focal distance while users need to integrate this information with real-world information at different depth layers. This problem is pronounced in body-proximate multi-display systems, such as when an OST HMD is combined with a smartphone or smartwatch. While such joint systems open up a new design space, they also reduce users' ability to integrate visual information. We quantify this cost by presenting the results of an experiment (n=24) that evaluates human performance in a visual search task across an OST HMD and a body-proximate display at 30 cm. The results reveal that task completion time increases significantly by approximately 50 % and the error rate increases significantly by approximately 100 % compared to visual search on a single depth layer. These results highlight a design trade-off when designing joint OST HMD-body proximate display systems.
HCJul 18, 2019
ReconViguRation: Reconfiguring Physical Keyboards in Virtual RealityDaniel Schneider, Alexander Otte, Travis Gesslein et al.
Physical keyboards are common peripherals for personal computers and are efficient standard text entry devices. Recent research has investigated how physical keyboards can be used in immersive head-mounted display-based Virtual Reality (VR). So far, the physical layout of keyboards has typically been transplanted into VR for replicating typing experiences in a standard desktop environment. In this paper, we explore how to fully leverage the immersiveness of VR to change the input and output characteristics of physical keyboard interaction within a VR environment. This allows individual physical keys to be reconfigured to the same or different actions and visual output to be distributed in various ways across the VR representation of the keyboard. We explore a set of input and output mappings for reconfiguring the virtual presentation of physical keyboards and probe the resulting design space by specifically designing, implementing and evaluating nine VR-relevant applications: emojis, languages and special characters, application shortcuts, virtual text processing macros, a window manager, a photo browser, a whack-a-mole game, secure password entry and a virtual touch bar. We investigate the feasibility of the applications in a user study with 20 participants and find that, among other things, they are usable in VR. We discuss the limitations and possibilities of remapping the input and output characteristics of physical keyboards in VR based on empirical findings and analysis and suggest future research directions in this area.
HCDec 5, 2018
The Office of the Future: Virtual, Portable and GlobalJens Grubert, Eyal Ofek, Michel Pahud et al.
Virtual Reality has the potential to change the way we work. We envision the future office worker to be able to work productively everywhere solely using portable standard input devices and immersive head-mounted displays. Virtual Reality has the potential to enable this, by allowing users to create working environments of their choice and by relieving them from physical world limitations such as constrained space or noisy environments. In this article, we investigate opportunities and challenges for realizing this vision and discuss implications from recent findings of text entry in virtual reality as a core office task.
AIApr 20, 2018
The Statistical Model for Ticker, an Adaptive Single-Switch Text-Entry Method for Visually Impaired UsersEmli-Mari Nel, Per Ola Kristensson, David J. C. MacKay
This paper presents the statistical model for Ticker [1], a novel probabilistic stereophonic single-switch text entry method for visually-impaired users with motor disabilities who rely on single-switch scanning systems to communicate. All terminology and notation are defined in [1].
HCFeb 2, 2018
Text Entry in Immersive Head-Mounted Display-based Virtual Reality using Standard KeyboardsJens Grubert, Lukas Witzani, Eyal Ofek et al.
We study the performance and user experience of two popular mainstream text entry devices, desktop keyboards and touchscreen keyboards, for use in Virtual Reality (VR) applications. We discuss the limitations arising from limited visual feedback, and examine the efficiency of different strategies of use. We analyze a total of 24 hours of typing data in VR from 24 participants and find that novice users are able to retain about 60% of their typing speed on a desktop keyboard and about 40-45\% of their typing speed on a touchscreen keyboard. We also find no significant learning effects, indicating that users can transfer their typing skills fast into VR. Besides investigating baseline performances, we study the position in which keyboards and hands are rendered in space. We find that this does not adversely affect performance for desktop keyboard typing and results in a performance trade-off for touchscreen keyboard typing.
HCFeb 2, 2018
Effects of Hand Representations for Typing in Virtual RealityJens Grubert, Lukas Witzani, Eyal Ofek et al.
Alphanumeric text entry is a challenge for Virtual Reality (VR) applications. VR enables new capabilities, impossible in the real world, such as an unobstructed view of the keyboard, without occlusion by the user's physical hands. Several hand representations have been proposed for typing in VR on standard physical keyboards. However, to date, these hand representations have not been compared regarding their performance and effects on presence for VR text entry. Our work addresses this gap by comparing existing hand representations with minimalistic fingertip visualization. We study the effects of four hand representations (no hand representation, inverse kinematic model, fingertip visualization using spheres and video inlay) on typing in VR using a standard physical keyboard with 24 participants. We found that the fingertip visualization and video inlay both resulted in statistically significant lower text entry error rates compared to no hand or inverse kinematic model representations. We found no statistical differences in text entry speed.
HCDec 28, 2017
Modelling Noise-Resilient Single-Switch Scanning SystemsEmli-Mari Nel, Per Ola Kristensson, David J. C. MacKay
Single-switch scanning systems allow nonspeaking individuals with motor disabilities to communicate by triggering a single switch (e.g., raising an eye brow). A problem with current single-switch scanning systems is that while they result in reasonable performance in noiseless conditions, for instance via simulation or tests with able-bodied users, they fail to accurately model the noise sources that are introduced when a non-speaking individual with motor disabilities is triggering the switch in a realistic use context. To help assist the development of more noise-resilient single-switch scanning systems we have developed a mathematical model of scanning systems which incorporates extensive noise modelling. Our model includes an improvement to the standard scanning method, which we call fast-scan, which we show via simulation can be more suitable for certain users of scanning systems.
CLSep 19, 2017
Neural Networks for Text Correction and Completion in Keyboard DecodingShaona Ghosh, Per Ola Kristensson
Despite the ubiquity of mobile and wearable text messaging applications, the problem of keyboard text decoding is not tackled sufficiently in the light of the enormous success of the deep learning Recurrent Neural Network (RNN) and Convolutional Neural Networks (CNN) for natural language understanding. In particular, considering that the keyboard decoders should operate on devices with memory and processor resource constraints, makes it challenging to deploy industrial scale deep neural network (DNN) models. This paper proposes a sequence-to-sequence neural attention network system for automatic text correction and completion. Given an erroneous sequence, our model encodes character level hidden representations and then decodes the revised sequence thus enabling auto-correction and completion. We achieve this by a combination of character level CNN and gated recurrent unit (GRU) encoder along with and a word level gated recurrent unit (GRU) attention decoder. Unlike traditional language models that learn from billions of words, our corpus size is only 12 million words; an order of magnitude smaller. The memory footprint of our learnt model for inference and prediction is also an order of magnitude smaller than the conventional language model based text decoders. We report baseline performance for neural keyboard decoders in such limited domain. Our models achieve a word level accuracy of $90\%$ and a character error rate CER of $2.4\%$ over the Twitter typo dataset. We present a novel dataset of noisy to corrected mappings by inducing the noise distribution from the Twitter data over the OpenSubtitles 2009 dataset; on which our model predicts with a word level accuracy of $98\%$ and sequence accuracy of $68.9\%$. In our user study, our model achieved an average CER of $2.6\%$ with the state-of-the-art non-neural touch-screen keyboard decoder at CER of $1.6\%$.