FedMM-X: A Trustworthy and Interpretable Framework for Federated Multi-Modal Learning in Dynamic Environments
This work addresses the challenge of trustworthy AI for systems operating in real-world environments with multi-modal data, though it appears incremental as it builds on existing federated and multi-modal learning approaches.
The paper tackles the problem of achieving trustworthy intelligence in decentralized, dynamic environments by integrating federated learning with explainable multi-modal reasoning, resulting in improved accuracy and interpretability while reducing vulnerabilities to adversarial and spurious correlations.
As artificial intelligence systems increasingly operate in Real-world environments, the integration of multi-modal data sources such as vision, language, and audio presents both unprecedented opportunities and critical challenges for achieving trustworthy intelligence. In this paper, we propose a novel framework that unifies federated learning with explainable multi-modal reasoning to ensure trustworthiness in decentralized, dynamic settings. Our approach, called FedMM-X (Federated Multi-Modal Explainable Intelligence), leverages cross-modal consistency checks, client-level interpretability mechanisms, and dynamic trust calibration to address challenges posed by data heterogeneity, modality imbalance, and out-of-distribution generalization. Through rigorous evaluation across federated multi-modal benchmarks involving vision-language tasks, we demonstrate improved performance in both accuracy and interpretability while reducing vulnerabilities to adversarial and spurious correlations. Further, we introduce a novel trust score aggregation method to quantify global model reliability under dynamic client participation. Our findings pave the way toward developing robust, interpretable, and socially responsible AI systems in Real-world environments.