CVAIJul 23, 2024

FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network

arXiv:2407.16129v12 citationsh-index: 31Has Code
Originality Incremental advance
AI Analysis

This work addresses robust object detection in varying visual conditions for applications like autonomous systems, but it is incremental as it builds on existing multimodal detection methods.

The paper tackles the problem of multimodal object detection by proposing a low-rank adaptation model with a shared backbone to reduce parameters and improve consistency, achieving a 10.4% accuracy improvement and 149M-parameter reduction on the DroneVehicle dataset.

Multimodal object detection offers a promising prospect to facilitate robust detection in various visual conditions. However, existing two-stream backbone networks are challenged by complex fusion and substantial parameter increments. This is primarily due to large data distribution biases of multimodal homogeneous information. In this paper, we propose a novel multimodal object detector, named Low-rank Modal Adaptors (LMA) with a shared backbone. The shared parameters enhance the consistency of homogeneous information, while lightweight modal adaptors focus on modality unique features. Furthermore, we design an adaptive rank allocation strategy to adapt to the varying heterogeneity at different feature levels. When applied to two multimodal object detection datasets, experiments validate the effectiveness of our method. Notably, on DroneVehicle, LMA attains a 10.4% accuracy improvement over the state-of-the-art method with a 149M-parameters reduction. The code is available at https://github.com/zyszxhy/FoRA. Our work was submitted to ACM MM in April 2024, but was rejected. We will continue to refine our work and paper writing next, mainly including proof of theory and multi-task applications of FoRA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes