66.9CVMay 15Code
Visual Agentic Memory: Enabling Online Long Video Understanding via Online Indexing, Hierarchical Memory, and Agentic RetrievalAiden Yiliu Li, Nels Numan, Anthony Steed
Long video understanding requires more than large context windows. It also needs a memory mechanism that decides what visual evidence to retain, keeps it searchable over long horizons, and grounds later reasoning in recoverable observations rather than compressed latent state alone. We propose Visual Agentic Memory (VAM), a training-free framework with three components. Online Indexing supports selective evidence retention under streaming constraints. Hierarchical Memory organises retained evidence in a Parallel Representation that aligns temporal context with spatial observations. Agentic Retrieval searches, inspects, and verifies candidate evidence before producing a grounded answer. On OVO-Bench, VAM achieves the highest RT+BT average (68.41) across all reported baselines, improving over end-to-end use of the same underlying MLLM (Gemini 3 Flash, 67.46). On the month-scale split of MM-Lifelong train@month (105.6 hours over 51 days), VAM reaches 17.11%, second only to ReMA with GPT-5 (17.62%). These results suggest that long-horizon video understanding benefits from treating visual memory as an explicit, inspectable, and queryable substrate. Code is available at https://github.com/yiliu-li/Visual-Agentic-Memory.
HCDec 16, 2021Code
Ubiq: A System to Build Flexible Social Virtual Reality ExperiencesSebastian Friston, Ben Congdon, David Swapp et al.
While they have long been a subject of academic study, social virtual reality (SVR) systems are now attracting increasingly large audiences on current consumer virtual reality systems. The design space of SVR systems is very large, and relatively little is known about how these systems should be constructed in order to be usable and efficient. In this paper we present Ubiq, a toolkit that focuses on facilitating the construction of SVR systems. We argue for the design strategy of Ubiq and its scope. Ubiq is built on the Unity platform. It provides core functionality of many SVR systems such as connection management, voice, avatars, etc. However, its design remains easy to extend. We demonstrate examples built on Ubiq and how it has been successfully used in classroom teaching. Ubiq is open source (Apache License) and thus enables several use cases that commercial systems cannot.
CVMar 3, 2025
Blind Augmentation: Calibration-free Camera Distortion Model Estimation for Real-time Mixed-reality ConsistencySiddhant Prakash, David R. Walton, Rafael K. dos Anjos et al.
Real camera footage is subject to noise, motion blur (MB) and depth of field (DoF). In some applications these might be considered distortions to be removed, but in others it is important to model them because it would be ineffective, or interfere with an aesthetic choice, to simply remove them. In augmented reality applications where virtual content is composed into a live video feed, we can model noise, MB and DoF to make the virtual content visually consistent with the video. Existing methods for this typically suffer two main limitations. First, they require a camera calibration step to relate a known calibration target to the specific cameras response. Second, existing work require methods that can be (differentiably) tuned to the calibration, such as slow and specialized neural networks. We propose a method which estimates parameters for noise, MB and DoF instantly, which allows using off-the-shelf real-time simulation methods from e.g., a game engine in compositing augmented content. Our main idea is to unlock both features by showing how to use modern computer vision methods that can remove noise, MB and DoF from the video stream, essentially providing self-calibration. This allows to auto-tune any black-box real-time noise+MB+DoF method to deliver fast and high-fidelity augmentation consistency.
HCJul 7, 2021
Telelife: The Future of Remote LivingJason Orlosky, Misha Sra, Kenan Bektaş et al.
In recent years, everyday activities such as work and socialization have steadily shifted to more remote and virtual settings. With the COVID-19 pandemic, the switch from physical to virtual has been accelerated, which has substantially affected various aspects of our lives, including business, education, commerce, healthcare, and personal life. This rapid and large-scale switch from in-person to remote interactions has revealed that our current technologies lack functionality and are limited in their ability to recreate interpersonal interactions. To help address these limitations in the future, we introduce "Telelife," a vision for the near future that depicts the potential means to improve remote living better aligned with how we interact, live and work in the physical world. Telelife encompasses novel synergies of technologies and concepts such as digital twins, virtual prototyping, and attention and context-aware user interfaces with innovative hardware that can support ultrarealistic graphics, user state detection, and more. These ideas will guide the transformation of our daily lives and routines soon, targeting the year 2035. In addition, we identify opportunities across high-impact applications in domains related to this vision of Telelife. Along with a recent survey of relevant fields such as human-computer interaction, pervasive computing, and virtual reality, the directions outlined in this paper will guide future research on remote living.
HCJun 23, 2021
Directions for 3D User Interface Research from Consumer VR GamesAnthony Steed, Tuukka M. Takala, Daniel Archer et al.
With the continuing development of affordable immersive virtual reality (VR) systems, there is now a growing market for consumer content. The current form of consumer systems is not dissimilar to the lab-based VR systems of the past 30 years: the primary input mechanism is a head-tracked display and one or two tracked hands with buttons and joysticks on hand-held controllers. Over those 30 years, a very diverse academic literature has emerged that covers design and ergonomics of 3D user interfaces (3DUIs). However, the growing consumer market has engaged a very broad range of creatives that have built a very diverse set of designs. Sometimes these designs adopt findings from the academic literature, but other times they experiment with completely novel or counter-intuitive mechanisms. In this paper and its online adjunct, we report on novel 3DUI design patterns that are interesting from both design and research perspectives: they are highly novel, potentially broadly re-usable and/or suggest interesting avenues for evaluation. The supplemental material, which is a living document, is a crowd-sourced repository of interesting patterns. This paper is a curated snapshot of those patterns that were considered to be the most fruitful for further elaboration.
HCMay 5, 2021
Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual RealityDaniele Giunchi, Alejandro Sztrajman, Stuart James et al.
Sketch and speech are intuitive interaction methods that convey complementary information and have been independently used for 3D model retrieval in virtual environments. While sketch has been shown to be an effective retrieval method, not all collections are easily navigable using this modality alone. We design a new challenging database for sketch comprised of 3D chairs where each of the components (arms, legs, seat, back) are independently colored. To overcome this, we implement a multimodal interface for querying 3D model databases within a virtual environment. We base the sketch on the state-of-the-art for 3D Sketch Retrieval, and use a Wizard-of-Oz style experiment to process the voice input. In this way, we avoid the complexities of natural language processing which frequently requires fine-tuning to be robust. We conduct two user studies and show that hybrid search strategies emerge from the combination of interactions, fostering the advantages provided by both modalities.
HCApr 12, 2021
Effectiveness of Social Virtual RealityLisa Izzouzi, Anthony Steed
A lot of work in social virtual reality, including our own group's, has focused on effectiveness of specific social behaviours such as eye-gaze, turn taking, gestures and other verbal and non-verbal cues. We have built upon these to look at emergent phenomena such as co-presence, leadership and trust. These give us good information about the usability issues of specific social VR systems, but they don't give us much information about the requirements for such systems going forward. In this short paper we discuss how we are broadening the scope of our work on social systems, to move out of the laboratory to more ecologically valid situations and to study groups using social VR for longer periods of time.
HCApr 12, 2021
Some Lessons Learned Running Virtual Reality Experiments Out of the LaboratoryAnthony Steed, Daniel Archer, Ben Congdon et al.
In the past twelve months, our team has had to move rapidly from conducting most of our user experiments in a laboratory setting, to running experiments in the wild away from the laboratory and without direct synchronous oversight from an experimenter. This has challenged us to think about what types of experiment we can run, and to improve our tools and methods to allow us to reliably capture the necessary data. It has also offered us an opportunity to engage with a more diverse population than we would normally engage with in the laboratory. In this position paper we elaborate on the challenges and opportunities, and give some lessons learned from our own experience.
HCApr 12, 2021
What We Measure in Mixed Reality ExperimentsAnthony Steed
There are many potential measures that one might use when evaluating mixed-reality experiences. In this position paper I will argue that there are various stances to take for evaluation, depending on the framing of the experience within a larger body of work. I will draw upon various types of work that my team has been involved with in order to illustrate these different stances. I will then sketch out some directions for developing more robust measures that can help the field move forward.
HCOct 2, 2020
Real-time Collaboration Between Mixed Reality Users in Geo-referenced Virtual EnvironmentShubham Singh, Zengou Ma, Daniele Giunchi et al.
Collaboration using mixed reality technology is an active area of research, where significant research is done to virtually bridge physical distances. There exist a diverse set of platforms and devices that can be used for a mixed-reality collaboration, and is largely focused for indoor scenarios, where, a stable tracking can be assumed. We focus on supporting collaboration between VR and AR users, where AR user is mobile outdoors, and VR user is immersed in true-sized digital twin. This cross-platform solution requires new user experiences for interaction, accurate modelling of the real-world, and working with noisy outdoor tracking sensor such as GPS. In this paper, we present our results and observations of real-time collaboration between cross-platform users, in the context of a geo-referenced virtual environment. We propose a solution for using GPS measurement in VSLAM to localize the AR user in an outdoor environment. The client applications enable VR and AR user to collaborate across the heterogeneous platforms seamlessly. The user can place or load dynamic contents tagged to a geolocation and share their experience with remote users in real-time.
HCFeb 14, 2020
Docking Haptics: Extending the Reach of Haptics by Dynamic Combinations of Grounded and Worn DevicesAnthony Steed, Sebastian Friston, Vijay Pawar et al.
Grounded haptic devices can provide a variety of forces but have limited working volumes. Wearable haptic devices operate over a large volume but are relatively restricted in the types of stimuli they can generate. We propose the concept of docking haptics, in which different types of haptic devices are dynamically docked at run time. This creates a hybrid system, where the potential feedback depends on the user's location. We show a prototype docking haptic workspace, combining a grounded six degree-of-freedom force feedback arm with a hand exoskeleton. We are able to create the sensation of weight on the hand when it is within reach of the grounded device, but away from the grounded device, hand-referenced force feedback is still available. A user study demonstrates that users can successfully discriminate weight when using docking haptics, but not with the exoskeleton alone. Such hybrid systems would be able to change configuration further, for example docking two grounded devices to a hand in order to deliver twice the force, or extend the working volume. We suggest that the docking haptics concept can thus extend the practical utility of haptics in user interfaces.
HCOct 25, 2019
Mixing realities for sketch retrieval in Virtual RealityDaniele Giunchi, Stuart james, Donald Degraen et al.
Drawing tools for Virtual Reality (VR) enable users to model 3D designs from within the virtual environment itself. These tools employ sketching and sculpting techniques known from desktop-based interfaces and apply them to hand-based controller interaction. While these techniques allow for mid-air sketching of basic shapes, it remains difficult for users to create detailed and comprehensive 3D models. In our work, we focus on supporting the user in designing the virtual environment around them by enhancing sketch-based interfaces with a supporting system for interactive model retrieval. Through sketching, an immersed user can query a database containing detailed 3D models and replace them into the virtual environment. To understand supportive sketching within a virtual environment, we compare different methods of sketch interaction, i.e., 3D mid-air sketching, 2D sketching on a virtual tablet, 2D sketching on a fixed virtual whiteboard, and 2D sketching on a real tablet. %using a 2D physical tablet, a 2D virtual tablet, a 2D virtual whiteboard, and 3D mid-air sketching. Our results show that 3D mid-air sketching is considered to be a more intuitive method to search a collection of models while the addition of physical devices creates confusion due to the complications of their inclusion within a virtual environment. While we pose our work as a retrieval problem for 3D models of chairs, our results can be extrapolated to other sketching tasks for virtual environments.