LGMar 28Code
Recent Advances of Multimodal Continual Learning: A Comprehensive SurveyDianzhi Yu, Xinni Zhang, Yankai Chen et al. · tsinghua
Continual learning (CL) aims to empower machine learning models to learn continually from new data, while building upon previously acquired knowledge without forgetting. As models have evolved from small to large pre-trained architectures, and from supporting unimodal to multimodal data, multimodal continual learning (MMCL) methods have recently emerged. The primary complexity of MMCL is that it extends beyond a simple stacking of unimodal CL methods. Such straightforward approaches often suffer from multimodal catastrophic forgetting, yielding unsatisfactory performance. In addition, MMCL introduces new challenges that unimodal CL methods fail to adequately address, including modality imbalance, complex modality interaction, high computational costs, and degradation of pre-trained zero-shot capability of multimodal backbones. In this work, we present the first comprehensive survey on MMCL. We provide essential background knowledge and MMCL settings, as well as a structured taxonomy of MMCL methods. We categorize MMCL methods into four categories, i.e., regularization-based, architecture-based, replay-based, and prompt-based methods, explaining their methodologies and highlighting their key innovations. Additionally, to prompt further research in this field, we summarize open MMCL datasets and benchmarks, provide an in-depth discussion, and discuss several promising future directions. We have also created a GitHub repository for indexing relevant MMCL papers and open resources available at https://github.com/LucyDYu/Awesome-Multimodal-Continual-Learning.
IRFeb 18, 2025Code
G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable RecommendationYuhan Li, Xinni Zhang, Linhao Luo et al.
Explainable recommendation has demonstrated significant advantages in informing users about the logic behind recommendations, thereby increasing system transparency, effectiveness, and trustworthiness. To provide personalized and interpretable explanations, existing works often combine the generation capabilities of large language models (LLMs) with collaborative filtering (CF) information. CF information extracted from the user-item interaction graph captures the user behaviors and preferences, which is crucial for providing informative explanations. However, due to the complexity of graph structure, effectively extracting the CF information from graphs still remains a challenge. Moreover, existing methods often struggle with the integration of extracted CF information with LLMs due to its implicit representation and the modality gap between graph structures and natural language explanations. To address these challenges, we propose G-Refer, a framework using graph retrieval-augmented large language models (LLMs) for explainable recommendation. Specifically, we first employ a hybrid graph retrieval mechanism to retrieve explicit CF signals from both structural and semantic perspectives. The retrieved CF information is explicitly formulated as human-understandable text by the proposed graph translation and accounts for the explanations generated by LLMs. To bridge the modality gap, we introduce knowledge pruning and retrieval-augmented fine-tuning to enhance the ability of LLMs to process and utilize the retrieved CF information to generate explanations. Extensive experiments show that G-Refer achieves superior performance compared with existing methods in both explainability and stability. Codes and data are available at https://github.com/Yuhan1i/G-Refer.
LGNov 13, 2025
ConSurv: Multimodal Continual Learning for Survival AnalysisDianzhi Yu, Conghao Xiong, Yankai Chen et al.
Survival prediction of cancers is crucial for clinical practice, as it informs mortality risks and influences treatment plans. However, a static model trained on a single dataset fails to adapt to the dynamically evolving clinical environment and continuous data streams, limiting its practical utility. While continual learning (CL) offers a solution to learn dynamically from new datasets, existing CL methods primarily focus on unimodal inputs and suffer from severe catastrophic forgetting in survival prediction. In real-world scenarios, multimodal inputs often provide comprehensive and complementary information, such as whole slide images and genomics; and neglecting inter-modal correlations negatively impacts the performance. To address the two challenges of catastrophic forgetting and complex inter-modal interactions between gigapixel whole slide images and genomics, we propose ConSurv, the first multimodal continual learning (MMCL) method for survival analysis. ConSurv incorporates two key components: Multi-staged Mixture of Experts (MS-MoE) and Feature Constrained Replay (FCR). MS-MoE captures both task-shared and task-specific knowledge at different learning stages of the network, including two modality encoders and the modality fusion component, learning inter-modal relationships. FCR further enhances learned knowledge and mitigates forgetting by restricting feature deviation of previous data at different levels, including encoder-level features of two modalities and the fusion-level representations. Additionally, we introduce a new benchmark integrating four datasets, Multimodal Survival Analysis Incremental Learning (MSAIL), for comprehensive evaluation in the CL setting. Extensive experiments demonstrate that ConSurv outperforms competing methods across multiple metrics.
CLOct 7, 2025
RECODE-H: A Benchmark for Research Code Development with Interactive Human FeedbackChunyu Miao, Henry Peng Zou, Yangning Li et al.
Large language models (LLMs) show the promise in supporting scientific research implementation, yet their ability to generate correct and executable code remains limited. Existing works largely adopt one-shot settings, ignoring the iterative and feedback-driven nature of realistic workflows of scientific research development. To address this gap, we present RECODE-H, a benchmark of 102 tasks from research papers and repositories that evaluates LLM agents through multi-turn interactions with LLM-simulated human feedback. It includes structured instructions,unit tests, and a five-level feedback hierarchy to reflect realistic researcher-agent collaboration. We further present ReCodeAgent, a framework that integrates feedback into iterative code generation. Experiments with leading LLMs, including GPT-5, Claude-Sonnet-4, DeepSeek-V3.1, and Gemini 2.5, show substantial performance gains with richer feedback, while also highlighting ongoing challenges in the generation of complex research code. RECODE-H establishes a foundation for developing adaptive, feedback-driven LLM agents in scientific research implementation
AIOct 25, 2025
Embracing Trustworthy Brain-Agent Collaboration as Paradigm Extension for Intelligent Assistive TechnologiesYankai Chen, Xinni Zhang, Yifei Zhang et al.
Brain-Computer Interfaces (BCIs) offer a direct communication pathway between the human brain and external devices, holding significant promise for individuals with severe neurological impairments. However, their widespread adoption is hindered by critical limitations, such as low information transfer rates and extensive user-specific calibration. To overcome these challenges, recent research has explored the integration of Large Language Models (LLMs), extending the focus from simple command decoding to understanding complex cognitive states. Despite these advancements, deploying agentic AI faces technical hurdles and ethical concerns. Due to the lack of comprehensive discussion on this emerging direction, this position paper argues that the field is poised for a paradigm extension from BCI to Brain-Agent Collaboration (BAC). We emphasize reframing agents as active and collaborative partners for intelligent assistance rather than passive brain signal data processors, demanding a focus on ethical data handling, model reliability, and a robust human-agent collaboration framework to ensure these systems are safe, trustworthy, and effective.
CHEM-PHSep 2, 2025
BioMD: All-atom Generative Model for Biomolecular Dynamics SimulationBin Feng, Jiying Zhang, Xinni Zhang et al.
Molecular dynamics (MD) simulations are essential tools in computational chemistry and drug discovery, offering crucial insights into dynamic molecular behavior. However, their utility is significantly limited by substantial computational costs, which severely restrict accessible timescales for many biologically relevant processes. Despite the encouraging performance of existing machine learning (ML) methods, they struggle to generate extended biomolecular system trajectories, primarily due to the lack of MD datasets and the large computational demands of modeling long historical trajectories. Here, we introduce BioMD, the first all-atom generative model to simulate long-timescale protein-ligand dynamics using a hierarchical framework of forecasting and interpolation. We demonstrate the effectiveness and versatility of BioMD on the DD-13M (ligand unbinding) and MISATO datasets. For both datasets, BioMD generates highly realistic conformations, showing high physical plausibility and low reconstruction errors. Besides, BioMD successfully generates ligand unbinding paths for 97.1% of the protein-ligand systems within ten attempts, demonstrating its ability to explore critical unbinding pathways. Collectively, these results establish BioMD as a tool for simulating complex biomolecular processes, offering broad applicability for computational chemistry and drug discovery.
IRMay 30, 2021
DAGNN: Demand-aware Graph Neural Networks for Session-based RecommendationLiqi Yang, Linhan Luo, Lifeng Xin et al.
Session-based recommendations have been widely adopted for various online video and E-commerce Websites. Most existing approaches are intuitively proposed to discover underlying interests or preferences out of the anonymous session data. This apparently ignores the fact these sequential behaviors usually reflect session user's potential demand, i.e., a semantic level factor, and therefore how to estimate underlying demands from a session is challenging. To address aforementioned issue, this paper proposes a demand-aware graph neural networks (DAGNN). Particularly, a demand modeling component is designed to first extract session demand and the underlying multiple demands of each session is estimated using the global demand matrix. Then, the demand-aware graph neural network is designed to extract session demand graph to learn the demand-aware item embedddings for the later recommendations. The mutual information loss is further designed to enhance the quality of the learnt embeddings. Extensive experiments are evaluated on several real-world datasets and the proposed model achieves the SOTA model performance.
LGApr 13, 2021
Probing Negative Sampling Strategies to Learn GraphRepresentations via Unsupervised Contrastive LearningShiyi Chen, Ziao Wang, Xinni Zhang et al.
Graph representation learning has long been an important yet challenging task for various real-world applications. However, their downstream tasks are mainly performed in the settings of supervised or semi-supervised learning. Inspired by recent advances in unsupervised contrastive learning, this paper is thus motivated to investigate how the node-wise contrastive learning could be performed. Particularly, we respectively resolve the class collision issue and the imbalanced negative data distribution issue. Extensive experiments are performed on three real-world datasets and the proposed approach achieves the SOTA model performance.
LGSep 23, 2019
Heterogeneous-Temporal Graph Convolutional Networks: Make the Community Detection Much BetterYaping Zheng, Shiyi Chen, Xinni Zhang et al.
Community detection has long been an important yet challenging task to analyze complex networks with a focus on detecting topological structures of graph data. Essentially, real-world graph data contains various features, node and edge types which dynamically vary over time, and this invalidates most existing community detection approaches. To cope with these issues, this paper proposes the heterogeneous-temporal graph convolutional networks (HTGCN) to detect communities from hetergeneous and temporal graphs. Particularly, we first design a heterogeneous GCN component to acquire feature representations for each heterogeneous graph at each time step. Then, a residual compressed aggregation component is proposed to represent "dynamic" features for "varying" communities, which are then aggregated with "static" features extracted from current graph. Extensive experiments are evaluated on two real-world datasets, i.e., DBLP and IMDB. The promising results demonstrate that the proposed HTGCN is superior to both benchmark and the state-of-the-art approaches, e.g., GCN, GAT, GNN, LGNN, HAN and STAR, with respect to a number of evaluation criteria.