CVAIApr 2

Harmonized Tabular-Image Fusion via Gradient-Aligned Alternating Learning

arXiv:2604.0157962.9Has Code
Predicted impact top 53% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses a key optimization problem in multimodal learning for domains using tabular and image data, though it appears incremental as it builds on existing fusion methods.

The paper tackles gradient conflicts in multimodal tabular-image fusion by proposing a Gradient-Aligned Alternating Learning (GAAL) paradigm, which aligns modality gradients and uses uncertainty-based cross-modal gradient surgery to boost fusion performance, achieving superiority over state-of-the-art baselines in empirical experiments.

Multimodal tabular-image fusion is an emerging task that has received increasing attention in various domains. However, existing methods may be hindered by gradient conflicts between modalities, misleading the optimization of the unimodal learner. In this paper, we propose a novel Gradient-Aligned Alternating Learning (GAAL) paradigm to address this issue by aligning modality gradients. Specifically, GAAL adopts an alternating unimodal learning and shared classifier to decouple the multimodal gradient and facilitate interaction. Furthermore, we design uncertainty-based cross-modal gradient surgery to selectively align cross-modal gradients, thereby steering the shared parameters to benefit all modalities. As a result, GAAL can provide effective unimodal assistance and help boost the overall fusion performance. Empirical experiments on widely used datasets reveal the superiority of our method through comparison with various state-of-the-art (SoTA) tabular-image fusion baselines and test-time tabular missing baselines. The source code is available at https://github.com/njustkmg/ICME26-GAAL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes