ROMar 21, 2024
Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal NavigationLeyuan Sun, Asako Kanezaki, Guillaume Caron et al.
Object-goal navigation is a crucial engineering task for the community of embodied navigation; it involves navigating to an instance of a specified object category within unseen environments. Although extensive investigations have been conducted on both end-to-end and modular-based, data-driven approaches, fully enabling an agent to comprehend the environment through perceptual knowledge and perform object-goal navigation as efficiently as humans remains a significant challenge. Recently, large language models have shown potential in this task, thanks to their powerful capabilities for knowledge extraction and integration. In this study, we propose a data-driven, modular-based approach, trained on a dataset that incorporates common-sense knowledge of object-to-room relationships extracted from a large language model. We utilize the multi-channel Swin-Unet architecture to conduct multi-task learning incorporating with multimodal inputs. The results in the Habitat simulator demonstrate that our framework outperforms the baseline by an average of 10.6% in the efficiency metric, Success weighted by Path Length (SPL). The real-world demonstration shows that the proposed approach can efficiently conduct this task by traversing several rooms. For more details and real-world demonstrations, please check our project webpage (https://sunleyuan.github.io/ObjectNav).
CVFeb 2, 2024
Visual Gyroscope: Combination of Deep Learning Features and Direct Alignment for Panoramic StabilizationBruno Berenguel-Baeta, Antoine N. Andre, Guillaume Caron et al.
In this article we present a visual gyroscope based on equirectangular panoramas. We propose a new pipeline where we take advantage of combining three different methods to obtain a robust and accurate estimation of the attitude of the camera. We quantitatively and qualitatively validate our method on two image sequences taken with a $360^\circ$ dual-fisheye camera mounted on different aerial vehicles.
CVOct 9, 2018
3D model silhouette-based tracking in depth images for puppet suit dynamic video-mappingGuillaume Caron, Mounya Belghiti, Anthony Dessaux
Video-mapping is the process of coherent video-projection of images, animations or movies on static objects or buildings for shows. This paper focuses on the dynamic video-mapping of the suit of a puppet being moved by its puppeteer on the theater stage. This may allow changing the costume dynamically and simulate light interaction and more. Contrary to common video-mapping, the image warping cannot be done once, offline, before the show. It must be done in real-time, and considering a non-flat projection surface, so that the video-projected suit always maps perfectly the puppet, automatically. Hence, we propose a new visual tracking method of articulated object, for the puppet tracking, exploiting the silhouette of a 3D model of it, in the depth images of a Kinect v2. Then, considering the precise calibration between the latter and the video-projector, that we propose, coherent dynamic video-mapping is made possible as the presented results show.