Tristan Braud

h-index25

21papers

351citations

Novelty40%

AI Score47

Ranked #33,755 of 194,257 authors (top 17%)#12,035 in CV (top 20%)

21 Papers

7.3SIMar 6, 2022Code

Twitter Dataset for 2022 Russo-Ukrainian Crisis

Ehsan-Ul Haq, Gareth Tyson, Lik-Hang Lee et al.

Online Social Networks (OSNs) play a significant role in information sharing during a crisis. The data collected during such a crisis can reflect the large scale public opinions and sentiment. In addition, OSN data can also be used to study different campaigns that are employed by various entities to engineer public opinions. Such information sharing campaigns can range from spreading factual information to propaganda and misinformation. We provide a Twitter dataset of the 2022 Russo-Ukrainian conflict. In the first release, we share over 1.6 million tweets shared during the 1st week of the crisis.

1.3CLOct 18, 2023

AI Nushu: An Exploration of Language Emergence in Sisterhood -Through the Lens of Computational Linguistics

Yuqian Sun, Yuying Tang, Ze Gao et al.

This paper presents "AI Nushu," an emerging language system inspired by Nushu (women's scripts), the unique language created and used exclusively by ancient Chinese women who were thought to be illiterate under a patriarchal society. In this interactive installation, two artificial intelligence (AI) agents are trained in the Chinese dictionary and the Nushu corpus. By continually observing their environment and communicating, these agents collaborate towards creating a standard writing system to encode Chinese. It offers an artistic interpretation of the creation of a non-western script from a computational linguistics perspective, integrating AI technology with Chinese cultural heritage and a feminist viewpoint.

21.8CVAug 20, 2024Code

GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting

Changkun Liu, Shuai Chen, Yash Bhalgat et al.

We leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement (CPR) framework, GS-CPR. This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods. The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences. GS-CPR obviates the need for training feature extractors or descriptors by operating directly on RGB images, utilizing the 3D foundation model, MASt3R, for precise 2D matching. To improve the robustness of our model in challenging outdoor environments, we incorporate an exposure-adaptive module within the 3DGS framework. Consequently, GS-CPR enables efficient one-shot pose refinement given a single RGB query and a coarse initial pose estimation. Our proposed approach surpasses leading NeRF-based optimization methods in both accuracy and runtime across indoor and outdoor visual localization benchmarks, achieving new state-of-the-art accuracy on two indoor datasets. The project page is available at https://xrim-lab.github.io/GS-CPR/.

16.1CVNov 29, 2023Code

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

Huajian Huang, Changkun Liu, Yipeng Zhu et al.

Portable 360$^\circ$ cameras are becoming a cheap and efficient tool to establish large visual databases. By capturing omnidirectional views of a scene, these cameras could expedite building environment models that are essential for visual localization. However, such an advantage is often overlooked due to the lack of valuable datasets. This paper introduces a new benchmark dataset, 360Loc, composed of 360$^\circ$ images with ground truth poses for visual localization. We present a practical implementation of 360$^\circ$ mapping combining 360$^\circ$ images with lidar data to generate the ground truth 6DoF poses. 360Loc is the first dataset and benchmark that explores the challenge of cross-device visual positioning, involving 360$^\circ$ reference frames, and query frames from pinhole, ultra-wide FoV fisheye, and 360$^\circ$ cameras. We propose a virtual camera approach to generate lower-FoV query frames from 360$^\circ$ images, which ensures a fair comparison of performance among different query types in visual localization tasks. We also extend this virtual camera approach to feature matching-based and pose regression-based methods to alleviate the performance loss caused by the cross-device domain gap, and evaluate its effectiveness against state-of-the-art baselines. We demonstrate that omnidirectional visual localization is more robust in challenging large-scale scenes with symmetries and repetitive structures. These results provide new insights into 360-camera mapping and omnidirectional visual localization with cross-device queries.

7.9MMMar 16

Multimodal Cyber-physical Interaction in XR: Hybrid Doctoral Thesis Defense

Ahmad Alhilal, Kit Yung Lam, Lik-Hang Lee et al.

Academic events, such as a doctoral thesis defense, are typically limited to either physical co-location or flat video conferencing, resulting in rigid participation formats and fragmented presence. We present a multimodal framework that breaks this binary by supporting a spectrum of participation - from in-person attendance to immersive virtual reality (VR) or browser access - and report our findings from using it to organize the first ever hybrid doctoral thesis defense using extended reality (XR). The framework integrates full-body motion tracking to synchronize the user's avatar motions and gestures, enabling natural interaction with onsite participants as well as body language and gestures with remote attendees in the virtual world. It leverages WebXR to provide cross-platform and instant accessibility with easy setup. User feedback analysis reveals positive VR experiences and demonstrates the framework's effectiveness in supporting various hybrid event activities.

1.5CVAug 10, 2023

KS-APR: Keyframe Selection for Robust Absolute Pose Regression

Changkun Liu, Yukun Zhao, Tristan Braud

Markerless Mobile Augmented Reality (AR) aims to anchor digital content in the physical world without using specific 2D or 3D objects. Absolute Pose Regressors (APR) are end-to-end machine learning solutions that infer the device's pose from a single monocular image. Thanks to their low computation cost, they can be directly executed on the constrained hardware of mobile AR devices. However, APR methods tend to yield significant inaccuracies for input images that are too distant from the training set. This paper introduces KS-APR, a pipeline that assesses the reliability of an estimated pose with minimal overhead by combining the inference results of the APR and the prior images in the training set. Mobile AR systems tend to rely upon visual-inertial odometry to track the relative pose of the device during the experience. As such, KS-APR favours reliability over frequency, discarding unreliable poses. This pipeline can integrate most existing APR methods to improve accuracy by filtering unreliable images with their pose estimates. We implement the pipeline on three types of APR models on indoor and outdoor datasets. The median error on position and orientation is reduced for all models, and the proportion of large errors is minimized across datasets. Our method enables state-of-the-art APRs such as DFNetdm to outperform single-image and sequential APR methods. These results demonstrate the scalability and effectiveness of KS-APR for visual localization tasks that do not require one-shot decisions.

1.5CVAug 10, 2023

Robust Localization with Visual-Inertial Odometry Constraints for Markerless Mobile AR

Changkun Liu, Yukun Zhao, Tristan Braud

Visual Inertial Odometry (VIO) is an essential component of modern Augmented Reality (AR) applications. However, VIO only tracks the relative pose of the device, leading to drift over time. Absolute pose estimation methods infer the device's absolute pose, but their accuracy depends on the input quality. This paper introduces VIO-APR, a new framework for markerless mobile AR that combines an absolute pose regressor (APR) with a local VIO tracking system. VIO-APR uses VIO to assess the reliability of the APR and the APR to identify and compensate for VIO drift. This feedback loop results in more accurate positioning and more stable AR experiences. To evaluate VIO-APR, we created a dataset that combines camera images with ARKit's VIO system output for six indoor and outdoor scenes of various scales. Over this dataset, VIO-APR improves the median accuracy of popular APR by up to 36\% in position and 29\% in orientation, increases the percentage of frames in the high ($0.25 m, 2^{\circ}$) accuracy level by up to 112\% and reduces the percentage of frames predicted below the low ($5 m, 10^\circ$) accuracy greatly. We implement VIO-APR into a mobile AR application using Unity to demonstrate its capabilities. VIO-APR results in noticeably more accurate localization and a more stable overall experience.

16.4CVFeb 22, 2024Code

HR-APR: APR-agnostic Framework with Uncertainty Estimation and Hierarchical Refinement for Camera Relocalisation

Changkun Liu, Shuai Chen, Yukun Zhao et al.

Absolute Pose Regressors (APRs) directly estimate camera poses from monocular images, but their accuracy is unstable for different queries. Uncertainty-aware APRs provide uncertainty information on the estimated pose, alleviating the impact of these unreliable predictions. However, existing uncertainty modelling techniques are often coupled with a specific APR architecture, resulting in suboptimal performance compared to state-of-the-art (SOTA) APR methods. This work introduces a novel APR-agnostic framework, HR-APR, that formulates uncertainty estimation as cosine similarity estimation between the query and database features. It does not rely on or affect APR network architecture, which is flexible and computationally efficient. In addition, we take advantage of the uncertainty for pose refinement to enhance the performance of APR. The extensive experiments demonstrate the effectiveness of our framework, reducing 27.4\% and 15.2\% of computational overhead on the 7Scenes and Cambridge Landmarks datasets while maintaining the SOTA accuracy in single-image APRs.

13.1CVFeb 7, 2025

SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting

Huajian Huang, Yingshu Chen, Longwei Li et al.

360-degree cameras streamline data collection for radiance field 3D reconstruction by capturing comprehensive scene data. However, traditional radiance field methods do not address the specific challenges inherent to 360-degree images. We present SC-OmniGS, a novel self-calibrating omnidirectional Gaussian splatting system for fast and accurate omnidirectional radiance field reconstruction using 360-degree images. Rather than converting 360-degree images to cube maps and performing perspective image calibration, we treat 360-degree images as a whole sphere and derive a mathematical framework that enables direct omnidirectional camera pose calibration accompanied by 3D Gaussians optimization. Furthermore, we introduce a differentiable omnidirectional camera model in order to rectify the distortion of real-world data for performance enhancement. Overall, the omnidirectional camera intrinsic model, extrinsic poses, and 3D Gaussians are jointly optimized by minimizing weighted spherical photometric loss. Extensive experiments have demonstrated that our proposed SC-OmniGS is able to recover a high-quality radiance field from noisy camera poses or even no pose prior in challenging scenarios characterized by wide baselines and non-object-centric configurations. The noticeable performance gain in the real-world dataset captured by consumer-grade omnidirectional cameras verifies the effectiveness of our general omnidirectional camera model in reducing the distortion of 360-degree images.

3.7CVMar 27, 2024

AIR-HLoc: Adaptive Retrieved Images Selection for Efficient Visual Localisation

Changkun Liu, Jianhao Jiao, Huajian Huang et al.

State-of-the-art hierarchical localisation pipelines (HLoc) employ image retrieval (IR) to establish 2D-3D correspondences by selecting the top-$k$ most similar images from a reference database. While increasing $k$ improves localisation robustness, it also linearly increases computational cost and runtime, creating a significant bottleneck. This paper investigates the relationship between global and local descriptors, showing that greater similarity between the global descriptors of query and database images increases the proportion of feature matches. Low similarity queries significantly benefit from increasing $k$, while high similarity queries rapidly experience diminishing returns. Building on these observations, we propose an adaptive strategy that adjusts $k$ based on the similarity between the query's global descriptor and those in the database, effectively mitigating the feature-matching bottleneck. Our approach optimizes processing time without sacrificing accuracy. Experiments on three indoor and outdoor datasets show that AIR-HLoc reduces feature matching time by up to 30\%, while preserving state-of-the-art accuracy. The results demonstrate that AIR-HLoc facilitates a latency-sensitive localisation system.

4.1HCDec 14, 2025

ORIBA: Exploring LLM-Driven Role-Play Chatbot as a Creativity Support Tool for Original Character Artists

Yuqian Sun, Xingyu Li, Shunyu Yao et al.

Recent advances in Generative AI (GAI) have led to new opportunities for creativity support. However, this technology has raised ethical concerns in the visual artists community. This paper explores how GAI can assist visual artists in developing original characters (OCs) while respecting their creative agency. We present ORIBA, an AI chatbot leveraging large language models (LLMs) to enable artists to role-play with their OCs, focusing on conceptualization (e.g., backstories) while leaving exposition (visual creation) to creators. Through a study with 14 artists, we found ORIBA motivated artists' imaginative engagement, developing multidimensional attributes and stronger bonds with OCs that inspire their creative process. Our contributions include design insights for AI systems that develop from artists' perspectives, demonstrating how LLMs can support cross-modal creativity while preserving creative agency in OC art. This paper highlights the potential of GAI as a neutral, non-visual support that strengthens existing creative practice, without infringing artistic exposition.

11.8CVOct 21, 2025

PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting

Changkun Liu, Bin Tan, Zeran Ke et al.

This paper addresses metric 3D reconstruction of indoor scenes by exploiting their inherent geometric regularities with compact representations. Using planar 3D primitives - a well-suited representation for man-made environments - we introduce PLANA3R, a pose-free framework for metric Planar 3D Reconstruction from unposed two-view images. Our approach employs Vision Transformers to extract a set of sparse planar primitives, estimate relative camera poses, and supervise geometry learning via planar splatting, where gradients are propagated through high-resolution rendered depth and normal maps of primitives. Unlike prior feedforward methods that require 3D plane annotations during training, PLANA3R learns planar 3D structures without explicit plane supervision, enabling scalable training on large-scale stereo datasets using only depth and normal annotations. We validate PLANA3R on multiple indoor-scene datasets with metric supervision and demonstrate strong generalization to out-of-domain indoor environments across diverse tasks under metric evaluation protocols, including 3D surface reconstruction, depth estimation, and relative pose estimation. Furthermore, by formulating with planar 3D representation, our method emerges with the ability for accurate plane segmentation. The project page is available at https://lck666666.github.io/plana3r

3.7CVJan 21, 2024

MobileARLoc: On-device Robust Absolute Localisation for Pervasive Markerless Mobile AR

Changkun Liu, Yukun Zhao, Tristan Braud

Recent years have seen significant improvement in absolute camera pose estimation, paving the way for pervasive markerless Augmented Reality (AR). However, accurate absolute pose estimation techniques are computation- and storage-heavy, requiring computation offloading. As such, AR systems rely on visual-inertial odometry (VIO) to track the device's relative pose between requests to the server. However, VIO suffers from drift, requiring frequent absolute repositioning. This paper introduces MobileARLoc, a new framework for on-device large-scale markerless mobile AR that combines an absolute pose regressor (APR) with a local VIO tracking system. Absolute pose regressors (APRs) provide fast on-device pose estimation at the cost of reduced accuracy. To address APR accuracy and reduce VIO drift, MobileARLoc creates a feedback loop where VIO pose estimations refine the APR predictions. The VIO system identifies reliable predictions of APR, which are then used to compensate for the VIO drift. We comprehensively evaluate MobileARLoc through dataset simulations. MobileARLoc halves the error compared to the underlying APR and achieve fast (80\,ms) on-device inference speed.

2.9HCFeb 23, 2022

From Digital Media to Empathic Reality: A Systematic Review of Empathy Research in Extended Reality Environments

Ville Paananen, Mohammad Sina Kiarostami, Lik-Hang Lee et al.

Recent advances in extended reality (XR) technologies have enabled new and increasingly realistic empathy tools and experiences. In XR, all interactions take place in different spatial contexts, all with different features, affordances, and constraints. We present a systematic literature survey of recent work on empathy in XR. As a result, we contribute a research roadmap with three future opportunities in XR-enabled empathy research across both physical and virtual spaces.

3.3MMJan 16, 2022

Nebula: Reliable Low-latency Video Transmission for Mobile Cloud Gaming

Ahmad Alhilal, Tristan Braud, Bo Han et al.

Mobile cloud gaming enables high-end games on constrained devices by streaming the game content from powerful servers through mobile networks. Mobile networks suffer from highly variable bandwidth, latency, and losses that affect the gaming experience. This paper introduces Nebula, an end-to-end cloud gaming framework to minimize the impact of network conditions on the user experience. Nebula relies on an end-to-end distortion model adapting the video source rate and the amount of frame-level redundancy based on the measured network conditions. As a result, it minimizes the motion-to-photon (MTP) latency while protecting the frames from losses. We fully implement Nebula and evaluate its performance against the state of the art techniques and latest research in real-time mobile cloud gaming transmission on a physical testbed over emulated and real wireless networks. Nebula consistently balances MTP latency (<140 ms) and visual quality (>31 dB) even in highly variable environments. A user experiment confirms that Nebula maximizes the user experience with high perceived video quality, playability, and low user load.

2.9HCJan 10, 2022

DiOS -- An Extended Reality Operating System for the Metaverse

Tristan Braud, Lik-Hang Lee, Ahmad Alhilal et al.

Driven by the recent improvements in device and networks capabilities, Extended Reality (XR) is becoming more pervasive; industry and academia alike envision ambitious projects such as the metaverse. However, XR is still limited by the current architecture of mobile systems. This paper makes the case for an XR-specific operating system (XROS). Such an XROS integrates hardware-support, computer vision algorithms, and XR-specific networking as the primitives supporting XR technology. These primitives represent the physical-digital world as a single shared resource among applications. Such an XROS allows for the development of coherent and system-wide interaction and display methods, systematic privacy preservation on sensor data, and performance improvement while simplifying application development.

6.4HCNov 9, 2021

EdgeXAR: A 6-DoF Camera Multi-target Interaction Framework for MAR with User-friendly Latency Compensation

Wenxiao Zhang, Sikun Lin, Farshid Hassani Bijarbooneh et al.

The computational capabilities of recent mobile devices enable the processing of natural features for Augmented Reality (AR), but the scalability is still limited by the devices' computation power and available resources. In this paper, we propose EdgeXAR, a mobile AR framework that utilizes the advantages of edge computing through task offloading to support flexible camera-based AR interaction. We propose a hybrid tracking system for mobile devices that provides lightweight tracking with 6 Degrees of Freedom and hides the offloading latency from users' perception. A practical, reliable and unreliable communication mechanism is used to achieve fast response and consistency of crucial information. We also propose a multi-object image retrieval pipeline that executes fast and accurate image recognition tasks on the cloud and edge servers. Extensive experiments are carried out to evaluate the performance of EdgeXAR by building mobile AR Apps upon it. Regarding the Quality of Experience (QoE), the mobile AR Apps powered by EdgeXAR framework run on average at the speed of 30 frames per second with precise tracking of only 1~2 pixel errors and accurate image recognition of at least 97% accuracy. As compared to Vuforia, one of the leading commercial AR frameworks, EdgeXAR transmits 87% less data while providing a stable 30 FPS performance and reducing the offloading latency by 50 to 70% depending on the transmission medium. Our work facilitates the large-scale deployment of AR as the next generation of ubiquitous interfaces.

5.9MMJan 14, 2021

AICP: Augmented Informative Cooperative Perception

Pengyuan Zhou, Pranvera Kortoci, Yui-Pan Yau et al.

Connected vehicles, whether equipped with advanced driver-assistance systems or fully autonomous, require human driver supervision and are currently constrained to visual information in their line-of-sight. A cooperative perception system among vehicles increases their situational awareness by extending their perception range. Existing solutions focus on improving perspective transformation and fast information collection. However, such solutions fail to filter out large amounts of less relevant data and thus impose significant network and computation load. Moreover, presenting all this less relevant data can overwhelm the driver and thus actually hinder them. To address such issues, we present Augmented Informative Cooperative Perception (AICP), the first fast-filtering system which optimizes the informativeness of shared data at vehicles to improve the fused presentation. To this end, an informativeness maximization problem is presented for vehicles to select a subset of data to display to their drivers. Specifically, we propose (i) a dedicated system design with custom data structure and lightweight routing protocol for convenient data encapsulation, fast interpretation and transmission, and (ii) a comprehensive problem formulation and efficient fitness-based sorting algorithm to select the most valuable data to display at the application layer. We implement a proof-of-concept prototype of AICP with a bandwidth-hungry, latency-constrained real-life augmented reality application. The prototype adds only 12.6 milliseconds of latency to a current informativeness-unaware system. Next, we test the networking performance of AICP at scale and show that ACIP effectively filters out less relevant packets and decreases the channel busy time.

6.6MASep 3, 2020Code

DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control in the IoV

Pengyuan Zhou, Xianfu Chen, Zhi Liu et al.

The Internet of Vehicles (IoV) enables real-time data exchange among vehicles and roadside units and thus provides a promising solution to alleviate traffic jams in the urban area. Meanwhile, better traffic management via efficient traffic light control can benefit the IoV as well by enabling a better communication environment and decreasing the network load. As such, IoV and efficient traffic light control can formulate a virtuous cycle. Edge computing, an emerging technology to provide low-latency computation capabilities at the edge of the network, can further improve the performance of this cycle. However, while the collected information is valuable, an efficient solution for better utilization and faster feedback has yet to be developed for edge-empowered IoV. To this end, we propose a Decentralized Reinforcement Learning at the Edge for traffic light control in the IoV (DRLE). DRLE exploits the ubiquity of the IoV to accelerate the collection of traffic data and its interpretation towards alleviating congestion and providing better traffic light control. DRLE operates within the coverage of the edge servers and uses aggregated data from neighboring edge servers to provide city-scale traffic light control. DRLE decomposes the highly complex problem of large area control. into a decentralized multi-agent problem. We prove its global optima with concrete mathematical reasoning. The proposed decentralized reinforcement learning algorithm running at each edge node adapts the traffic lights in real time. We conduct extensive evaluations and demonstrate the superiority of this approach over several state-of-the-art algorithms.

11.1HCJul 17, 2020

Towards Augmented Reality-driven Human-City Interaction: Current Research on Mobile Headsets and Future Challenges

Lik Hang Lee, Tristan Braud, Simo Hosio et al.

Interaction design for Augmented Reality (AR) is gaining increasing attention from both academia and industry. This survey discusses 260 articles (68.8% of articles published between 2015 - 2019) to review the field of human interaction in connected cities with emphasis on augmented reality-driven interaction. We provide an overview of Human-City Interaction and related technological approaches, followed by a review of the latest trends of information visualization, constrained interfaces, and embodied interaction for AR headsets. We highlight under-explored issues in interface design and input techniques that warrant further research, and conjecture that AR with complementary Conversational User Interfaces (CUIs) is a key enabler for ubiquitous interaction with immersive systems in smart cities. Our work helps researchers understand the current potential and future needs of AR in Human-City Interaction.

5.1CYMar 3, 2020

Marketplace for AI Models

Abhishek Kumar, Benjamin Finley, Tristan Braud et al.

Artificial intelligence shows promise for solving many practical societal problems in areas such as healthcare and transportation. However, the current mechanisms for AI model diffusion such as Github code repositories, academic project webpages, and commercial AI marketplaces have some limitations; for example, a lack of monetization methods, model traceability, and model auditabilty. In this work, we sketch guidelines for a new AI diffusion method based on a decentralized online marketplace. We consider the technical, economic, and regulatory aspects of such a marketplace including a discussion of solutions for problems in these areas. Finally, we include a comparative analysis of several current AI marketplaces that are already available or in development. We find that most of these marketplaces are centralized commercial marketplaces with relatively few models.