CVOct 19, 2024Code
MambaSOD: Dual Mamba-Driven Cross-Modal Fusion Network for RGB-D Salient Object DetectionYue Zhan, Zhihong Zeng, Haijun Liu et al.
The purpose of RGB-D Salient Object Detection (SOD) is to pinpoint the most visually conspicuous areas within images accurately. While conventional deep models heavily rely on CNN extractors and overlook the long-range contextual dependencies, subsequent transformer-based models have addressed the issue to some extent but introduce high computational complexity. Moreover, incorporating spatial information from depth maps has been proven effective for this task. A primary challenge of this issue is how to fuse the complementary information from RGB and depth effectively. In this paper, we propose a dual Mamba-driven cross-modal fusion network for RGB-D SOD, named MambaSOD. Specifically, we first employ a dual Mamba-driven feature extractor for both RGB and depth to model the long-range dependencies in multiple modality inputs with linear complexity. Then, we design a cross-modal fusion Mamba for the captured multi-modal features to fully utilize the complementary information between the RGB and depth features. To the best of our knowledge, this work is the first attempt to explore the potential of the Mamba in the RGB-D SOD task, offering a novel perspective. Numerous experiments conducted on six prevailing datasets demonstrate our method's superiority over sixteen state-of-the-art RGB-D SOD models. The source code will be released at https://github.com/YueZhan721/MambaSOD.
CVNov 1, 2023
Neural Implicit Field Editing Considering Object-environment InteractionZhihong Zeng, Zongji Wang, Yuanben Zhang et al.
The 3D scene editing method based on neural implicit field has gained wide attention. It has achieved excellent results in 3D editing tasks. However, existing methods often blend the interaction between objects and scene environment. The change of scene appearance like shadows is failed to be displayed in the rendering view. In this paper, we propose an Object and Scene environment Interaction aware (OSI-aware) system, which is a novel two-stream neural rendering system considering object and scene environment interaction. To obtain illuminating conditions from the mixture soup, the system successfully separates the interaction between objects and scene environment by intrinsic decomposition method. To study the corresponding changes to the scene appearance from object-level editing tasks, we introduce a depth map guided scene inpainting method and shadow rendering method by point matching strategy. Extensive experiments demonstrate that our novel pipeline produce reasonable appearance changes in scene editing tasks. It also achieve competitive performance for the rendering quality in novel-view synthesis tasks.
LGOct 22, 2024
Learning Load Balancing with GNN in MPTCP-Enabled Heterogeneous NetworksHan Ji, Xiping Wu, Zhihong Zeng et al.
Hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks are a promising paradigm of heterogeneous network (HetNet), attributed to the complementary physical properties of optical spectra and radio frequency. However, the current development of such HetNets is mostly bottlenecked by the existing transmission control protocol (TCP), which restricts the user equipment (UE) to connecting one access point (AP) at a time. While the ongoing investigation on multipath TCP (MPTCP) can bring significant benefits, it complicates the network topology of HetNets, making the existing load balancing (LB) learning models less effective. Driven by this, we propose a graph neural network (GNN)-based model to tackle the LB problem for MPTCP-enabled HetNets, which results in a partial mesh topology. Such a topology can be modeled as a graph, with the channel state information and data rate requirement embedded as node features, while the LB solutions are deemed as edge labels. Compared to the conventional deep neural network (DNN), the proposed GNN-based model exhibits two key strengths: i) it can better interpret a complex network topology; and ii) it can handle various numbers of APs and UEs with a single trained model. Simulation results show that against the traditional optimisation method, the proposed learning model can achieve near-optimal throughput within a gap of 11.5%, while reducing the inference time by 4 orders of magnitude. In contrast to the DNN model, the new method can improve the network throughput by up to 21.7%, at a similar inference time level.