Structure-Aware Fusion with Progressive Injection for Multimodal Molecular Representation Learning
This addresses robustness and generalization issues in molecular representation learning for drug discovery and related fields, representing a strong specific gain rather than a foundational advance.
The paper tackled the problem of unreliable 3D conformers and modality collapse in multimodal molecular models by proposing MuMo, a structured fusion framework that achieved an average improvement of 2.7% over baselines across 29 benchmark tasks, including a 27% improvement on the LD50 task.
Multimodal molecular models often suffer from 3D conformer unreliability and modality collapse, limiting their robustness and generalization. We propose MuMo, a structured multimodal fusion framework that addresses these challenges in molecular representation through two key strategies. To reduce the instability of conformer-dependent fusion, we design a Structured Fusion Pipeline (SFP) that combines 2D topology and 3D geometry into a unified and stable structural prior. To mitigate modality collapse caused by naive fusion, we introduce a Progressive Injection (PI) mechanism that asymmetrically integrates this prior into the sequence stream, preserving modality-specific modeling while enabling cross-modal enrichment. Built on a state space backbone, MuMo supports long-range dependency modeling and robust information propagation. Across 29 benchmark tasks from Therapeutics Data Commons (TDC) and MoleculeNet, MuMo achieves an average improvement of 2.7% over the best-performing baseline on each task, ranking first on 22 of them, including a 27% improvement on the LD50 task. These results validate its robustness to 3D conformer noise and the effectiveness of multimodal fusion in molecular representation. The code is available at: github.com/selmiss/MuMo.