Guillaume Allibert

h-index17

7papers

254citations

Novelty48%

AI Score42

Ranked #58,813 of 194,257 authors (top 30%)#20,311 in CV (top 34%)

7 Papers

18.1CVJan 18, 2023

HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness

Zongwei Wu, Guillaume Allibert, Fabrice Meriaudeau et al.

RGB-D saliency detection aims to fuse multi-modal cues to accurately localize salient regions. Existing works often adopt attention modules for feature modeling, with few methods explicitly leveraging fine-grained details to merge with semantic cues. Thus, despite the auxiliary depth information, it is still challenging for existing models to distinguish objects with similar appearances but at distinct camera distances. In this paper, from a new perspective, we propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection. Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies. To realize multi-modal and multi-level fusion, we first use a granularity-based attention scheme to strengthen the discriminatory power of RGB and depth features separately. Then we introduce a unified cross dual-attention module for multi-modal and multi-level fusion in a coarse-to-fine manner. The encoded multi-modal features are gradually aggregated into a shared decoder. Further, we exploit a multi-scale loss to take full advantage of the hierarchical information. Extensive experiments on challenging benchmark datasets demonstrate that our HiDAnet performs favorably over the state-of-the-art methods by large margins.

12.7CVAug 2, 2022Code

Robust RGB-D Fusion for Saliency Detection

Zongwei Wu, Shriarulmozhivarman Gobichettipalayam, Brahim Tamadazte et al.

Efficiently exploiting multi-modal inputs for accurate RGB-D saliency detection is a topic of high interest. Most existing works leverage cross-modal interactions to fuse the two streams of RGB-D for intermediate features' enhancement. In this process, a practical aspect of the low quality of the available depths has not been fully considered yet. In this work, we aim for RGB-D saliency detection that is robust to the low-quality depths which primarily appear in two forms: inaccuracy due to noise and the misalignment to RGB. To this end, we propose a robust RGB-D fusion method that benefits from (1) layer-wise, and (2) trident spatial, attention mechanisms. On the one hand, layer-wise attention (LWA) learns the trade-off between early and late fusion of RGB and depth features, depending upon the depth accuracy. On the other hand, trident spatial attention (TSA) aggregates the features from a wider spatial context to address the depth misalignment problem. The proposed LWA and TSA mechanisms allow us to efficiently exploit the multi-modal inputs for saliency detection while being robust against low-quality depths. Our experiments on five benchmark datasets demonstrate that the proposed fusion method performs consistently better than the state-of-the-art fusion alternatives.

8.1CVJun 8, 2022

Depth-Adapted CNNs for RGB-D Semantic Segmentation

Zongwei Wu, Guillaume Allibert, Christophe Stolz et al.

Recent RGB-D semantic segmentation has motivated research interest thanks to the accessibility of complementary modalities from the input side. Existing works often adopt a two-stream architecture that processes photometric and geometric information in parallel, with few methods explicitly leveraging the contribution of depth cues to adjust the sampling position on RGB images. In this paper, we propose a novel framework to incorporate the depth information in the RGB convolutional neural network (CNN), termed Z-ACN (Depth-Adapted CNN). Specifically, our Z-ACN generates a 2D depth-adapted offset which is fully constrained by low-level features to guide the feature extraction on RGB images. With the generated offset, we introduce two intuitive and effective operations to replace basic CNN operators: depth-adapted convolution and depth-adapted average pooling. Extensive experiments on both indoor and outdoor semantic segmentation tasks demonstrate the effectiveness of our approach.

8.1CVApr 18

Adverse-to-the-eXtreme Panoptic Segmentation: URVIS 2026 Study and Benchmark

Yiting Wang, Nolwenn Peyratout, Tim Brodermann et al.

This paper presents the report of the URVIS 2026 challenge on adverse-to-extreme panoptic segmentation. As the first challenge of its kind, it attracted 17 registered participants and 47 submissions, with 4 teams reaching the final phase. The challenge is based on the MUSES dataset, a multi-sensor benchmark for panoptic segmentation in adverse-to-extreme weather, including RGB frame camera, LiDAR, radar, and event camera data. Weighted Panoptic Quality (wPQ) is designed and adopted as the official ranking metric for fair evaluation across weather conditions. In this report, we summarise the challenge setting and benchmark results, analyse the performance of the submitted methods, and discuss current progress and remaining challenges for robust multimodal panoptic segmentation. Link: https://urvis-workshop.github.io/challenge-Muses.html

6.5CVOct 10, 2021

Modality-Guided Subnetwork for Salient Object Detection

Zongwei Wu, Guillaume Allibert, Christophe Stolz et al.

Recent RGBD-based models for saliency detection have attracted research attention. The depth clues such as boundary clues, surface normal, shape attribute, etc., contribute to the identification of salient objects with complicated scenarios. However, most RGBD networks require multi-modalities from the input side and feed them separately through a two-stream design, which inevitably results in extra costs on depth sensors and computation. To tackle these inconveniences, we present in this paper a novel fusion design named modality-guided subnetwork (MGSnet). It has the following superior designs: 1) Our model works for both RGB and RGBD data, and dynamically estimating depth if not available. Taking the inner workings of depth-prediction networks into account, we propose to estimate the pseudo-geometry maps from RGB input - essentially mimicking the multi-modality input. 2) Our MGSnet for RGB SOD results in real-time inference but achieves state-of-the-art performance compared to other RGB models. 3) The flexible and lightweight design of MGS facilitates the integration into RGBD two-streaming models. The introduced fusion design enables a cross-modality interaction to enable further progress but with a minimal cost.

7.2CVSep 21, 2020

Depth-Adapted CNN for RGB-D cameras

Zongwei Wu, Guillaume Allibert, Christophe Stolz et al.

Conventional 2D Convolutional Neural Networks (CNN) extract features from an input image by applying linear filters. These filters compute the spatial coherence by weighting the photometric information on a fixed neighborhood without taking into account the geometric information. We tackle the problem of improving the classical RGB CNN methods by using the depth information provided by the RGB-D cameras. State-of-the-art approaches use depth as an additional channel or image (HHA) or pass from 2D CNN to 3D CNN. This paper proposes a novel and generic procedure to articulate both photometric and geometric information in CNN architecture. The depth data is represented as a 2D offset to adapt spatial sampling locations. The new model presented is invariant to scale and rotation around the X and the Y axis of the camera coordinate system. Moreover, when depth data is constant, our model is equivalent to a regular CNN. Experiments of benchmarks validate the effectiveness of our model.

9.4ROApr 18, 2020

Model Predictive Path Integral Control Framework for Partially Observable Navigation: A Quadrotor Case Study

Ihab S. Mohamed, Guillaume Allibert, Philippe Martinet

Recently, Model Predictive Path Integral (MPPI) control algorithm has been extensively applied to autonomous navigation tasks, where the cost map is mostly assumed to be known and the 2D navigation tasks are only performed. In this paper, we propose a generic MPPI control framework that can be used for 2D or 3D autonomous navigation tasks in either fully or partially observable environments, which are the most prevalent in robotics applications. This framework exploits directly the 3D-voxel grid acquired from an on-board sensing system for performing collision-free navigation. We test the framework, in realistic RotorS-based simulation, on goal-oriented quadrotor navigation tasks in a cluttered environment, for both fully and partially observable scenarios. Preliminary results demonstrate that the proposed framework works perfectly, under partial observability, in 2D and 3D cluttered environments.