Romain Vuillemot

h-index13

6papers

3,499citations

Novelty41%

AI Score30

Ranked #135,305 of 194,257 authors (top 70%)#44,610 in CV (top 75%)

6 Papers

5.2HCJul 16

Authoring Narrative Visualization in Motion: Visual Storytelling in Swimming Videos

Junhao Zhao, Romain Vuillemot, Petra Isenberg et al.

We investigate how to support authoring narrative visualizations in motion in sports videos, drawing on automated data preparation, systematic analysis, technology probe design, and evaluation, using swimming races as a case study. Sports videos are widely broadcast and shared across social media, where content creators increasingly seek to present and explain complex events to general audiences. Visualization in motion has been explored as an efficient way to embed data into videos and to move with the data referents, providing additional information and helping audiences understand races. However, existing approaches primarily focus on embedding visualizations in videos, lacking exploration of how to support authoring narratives that coordinate views, data, and temporal progression to explain the unfolding races. To address this gap, we use swimming videos as an ideal case for exploration, as swimming is a sport with rich, dynamic data and visualizations in practice. We develop an automated pipeline that extracts structured data from videos, derive narrative constructs through observational analysis of sports broadcasts, and design a technology probe that supports authoring using data prepared by our pipeline and narrative constructs derived from our observations. We evaluate our approach with experienced content creators and/or graphic designers to examine the benefits and challenges of authoring narrative visualizations in motion. All supplemental materials are described in the Supplemental Material Pointers section and are on OSF: osf.io/bq47n/.

2.9HCFeb 21, 2022Code

ReViVD: Exploration and Filtering of Trajectories in an Immersive Environment using 3D Shapes

François Homps, Yohan Beugin, Romain Vuillemot

We present ReViVD, a tool for exploring and filtering large trajectory-based datasets using virtual reality. ReViVD's novelty lies in using simple 3D shapes -- such as cuboids, spheres and cylinders -- as queries for users to select and filter groups of trajectories. Building on this simple paradigm, more complex queries can be created by combining previously made selection groups through a system of user-created Boolean operations. We demonstrate the use of ReViVD in different application domains, from GPS position tracking to simulated data (e.g., turbulent particle flows and traffic simulation). Our results show the ease of use and expressiveness of the 3D geometric shapes in a broad range of exploratory tasks. ReViVD was found to be particularly useful for progressively refining selections to isolate outlying behaviors. It also acts as a powerful communication tool for conveying the structure of normally abstract datasets to an audience.

7.3ROSep 24, 2021Code

SIM2REALVIZ: Visualizing the Sim2Real Gap in Robot Ego-Pose Estimation

Theo Jaunet, Guillaume Bono, Romain Vuillemot et al.

The Robotics community has started to heavily rely on increasingly realistic 3D simulators for large-scale training of robots on massive amounts of data. But once robots are deployed in the real world, the simulation gap, as well as changes in the real world (e.g. lights, objects displacements) lead to errors. In this paper, we introduce Sim2RealViz, a visual analytics tool to assist experts in understanding and reducing this gap for robot ego-pose estimation tasks, i.e. the estimation of a robot's position using trained models. Sim2RealViz displays details of a given model and the performance of its instances in both simulation and real-world. Experts can identify environment differences that impact model predictions at a given location and explore through direct interactions with the model hypothesis to fix it. We detail the design of the tool, and case studies related to the exploit of the regression to the mean bias and how it can be addressed, and how models are perturbed by the vanish of landmarks such as bikes.

12.1CVApr 8, 2021

How Transferable are Reasoning Patterns in VQA?

Corentin Kervadec, Theo Jaunet, Grigory Antipov et al.

Since its inception, Visual Question Answering (VQA) is notoriously known as a task, where models are prone to exploit biases in datasets to find shortcuts instead of performing high-level reasoning. Classical methods address this by removing biases from training data, or adding branches to models to detect and remove biases. In this paper, we argue that uncertainty in vision is a dominating factor preventing the successful learning of reasoning in vision and language problems. We train a visual oracle and in a large scale study provide experimental evidence that it is much less prone to exploiting spurious dataset biases compared to standard models. We propose to study the attention mechanisms at work in the visual oracle and compare them with a SOTA Transformer-based model. We provide an in-depth analysis and visualizations of reasoning patterns obtained with an online visualization tool which we make publicly available (https://reasoningpatterns.github.io). We exploit these insights by transferring reasoning patterns from the oracle to a SOTA Transformer-based VQA model taking standard noisy visual inputs via fine-tuning. In experiments we report higher overall accuracy, as well as accuracy on infrequent answers for each question type, which provides evidence for improved generalization and a decrease of the dependency on dataset biases.

14.0CVApr 2, 2021Code

VisQA: X-raying Vision and Language Reasoning in Transformers

Theo Jaunet, Corentin Kervadec, Romain Vuillemot et al.

Visual Question Answering systems target answering open-ended textual questions given input images. They are a testbed for learning high-level reasoning with a primary use in HCI, for instance assistance for the visually impaired. Recent research has shown that state-of-the-art models tend to produce answers exploiting biases and shortcuts in the training data, and sometimes do not even look at the input image, instead of performing the required reasoning steps. We present VisQA, a visual analytics tool that explores this question of reasoning vs. bias exploitation. It exposes the key element of state-of-the-art neural models -- attention maps in transformers. Our working hypothesis is that reasoning steps leading to model predictions are observable from attention distributions, which are particularly useful for visualization. The design process of VisQA was motivated by well-known bias examples from the fields of deep learning and vision-language reasoning and evaluated in two ways. First, as a result of a collaboration of three fields, machine learning, vision and language reasoning, and data analytics, the work lead to a better understanding of bias exploitation of neural models for VQA, which eventually resulted in an impact on its design and training through the proposition of a method for the transfer of reasoning patterns from an oracle model. Second, we also report on the design of VisQA, and a goal-oriented evaluation of VisQA targeting the analysis of a model decision process from multiple experts, providing evidence that it makes the inner workings of models accessible to users.

10.7LGSep 6, 2019Code

DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning

Theo Jaunet, Romain Vuillemot, Christian Wolf

We present DRLViz, a visual analytics interface to interpret the internal memory of an agent (e.g. a robot) trained using deep reinforcement learning. This memory is composed of large temporal vectors updated when the agent moves in an environment and is not trivial to understand due to the number of dimensions, dependencies to past vectors, spatial/temporal correlations, and co-correlation between dimensions. It is often referred to as a black box as only inputs (images) and outputs (actions) are intelligible for humans. Using DRLViz, experts are assisted to interpret decisions using memory reduction interactions, and to investigate the role of parts of the memory when errors have been made (e.g. wrong direction). We report on DRLViz applied in the context of video games simulators (ViZDoom) for a navigation scenario with item gathering tasks. We also report on experts evaluation using DRLViz, and applicability of DRLViz to other scenarios and navigation problems beyond simulation games, as well as its contribution to black box models interpretability and explainability in the field of visual analytics.