Efficient Quantification of Multimodal Interaction at Sample Level
This provides a tool for analyzing fine-grained information dynamics in multimodal systems, enabling applications like sample partitioning and model ensembling, though it appears incremental as an improved quantification method.
The paper tackles the challenge of quantifying multimodal interactions (redundancy, uniqueness, synergy) at the sample level by introducing the LSMI estimator, which uses pointwise information theory and efficient entropy estimation to achieve precise and efficient measurement on synthetic and real-world datasets.
Interactions between modalities -- redundancy, uniqueness, and synergy -- collectively determine the composition of multimodal information. Understanding these interactions is crucial for analyzing information dynamics in multimodal systems, yet their accurate sample-level quantification presents significant theoretical and computational challenges. To address this, we introduce the Lightweight Sample-wise Multimodal Interaction (LSMI) estimator, rigorously grounded in pointwise information theory. We first develop a redundancy estimation framework, employing an appropriate pointwise information measure to quantify this most decomposable and measurable interaction. Building upon this, we propose a general interaction estimation method that employs efficient entropy estimation, specifically tailored for sample-wise estimation in continuous distributions. Extensive experiments on synthetic and real-world datasets validate LSMI's precision and efficiency. Crucially, our sample-wise approach reveals fine-grained sample- and category-level dynamics within multimodal data, enabling practical applications such as redundancy-informed sample partitioning, targeted knowledge distillation, and interaction-aware model ensembling. The code is available at https://github.com/GeWu-Lab/LSMI_Estimator.