Xianbing Zhao

CV
h-index44
6papers
27citations
Novelty53%
AI Score38

6 Papers

LGSep 9, 2023
SHAPE: A Sample-adaptive Hierarchical Prediction Network for Medication Recommendation

Sicen Liu, Xiaolong Wang, JIngcheng Du et al.

Effectively medication recommendation with complex multimorbidity conditions is a critical task in healthcare. Most existing works predicted medications based on longitudinal records, which assumed the information transmitted patterns of learning longitudinal sequence data are stable and intra-visit medical events are serialized. However, the following conditions may have been ignored: 1) A more compact encoder for intra-relationship in the intra-visit medical event is urgent; 2) Strategies for learning accurate representations of the variable longitudinal sequences of patients are different. In this paper, we proposed a novel Sample-adaptive Hierarchical medicAtion Prediction nEtwork, termed SHAPE, to tackle the above challenges in the medication recommendation task. Specifically, we design a compact intra-visit set encoder to encode the relationship in the medical event for obtaining visit-level representation and then develop an inter-visit longitudinal encoder to learn the patient-level longitudinal representation efficiently. To endow the model with the capability of modeling the variable visit length, we introduce a soft curriculum learning method to assign the difficulty of each sample automatically by the visit length. Extensive experiments on a benchmark dataset verify the superiority of our model compared with several state-of-the-art baselines.

LGSep 5, 2024
Learning in Order! A Sequential Strategy to Learn Invariant Features for Multimodal Sentiment Analysis

Xianbing Zhao, Lizhen Qu, Tao Feng et al.

This work proposes a novel and simple sequential learning strategy to train models on videos and texts for multimodal sentiment analysis. To estimate sentiment polarities on unseen out-of-distribution data, we introduce a multimodal model that is trained either in a single source domain or multiple source domains using our learning strategy. This strategy starts with learning domain invariant features from text, followed by learning sparse domain-agnostic features from videos, assisted by the selected features learned in text. Our experimental results demonstrate that our model achieves significantly better performance than the state-of-the-art approaches on average in both single-source and multi-source settings. Our feature selection procedure favors the features that are independent to each other and are strongly correlated with their polarity labels. To facilitate research on this topic, the source code of this work will be publicly available upon acceptance.

AIMay 31, 2023Code
DKINet: Medication Recommendation via Domain Knowledge Informed Deep Learning

Sicen Liu, Xiaolong Wang, Xianbing Zhao et al.

Medication recommendation is a fundamental yet crucial branch of healthcare that presents opportunities to assist physicians in making more accurate medication prescriptions for patients with complex health conditions. Previous studies have primarily focused on learning patient representation from electronic health records (EHR). While considering the clinical manifestations of the patient is important, incorporating domain-specific prior knowledge is equally significant in diagnosing the patient's health conditions. However, effectively integrating domain knowledge with the patient's clinical manifestations can be challenging, particularly when dealing with complex clinical manifestations. Therefore, in this paper, we first identify comprehensive domain-specific prior knowledge, namely the Unified Medical Language System (UMLS), which is a comprehensive repository of biomedical vocabularies and standards, for knowledge extraction. Subsequently, we propose a knowledge injection module that addresses the effective integration of domain knowledge with complex clinical manifestations, enabling an effective characterization of the health conditions of the patient. Furthermore, considering the significant impact of a patient's medication history on their current medication, we introduce a historical medication-aware patient representation module to capture the longitudinal influence of historical medication information on the representation of current patients. Extensive experiments on three publicly benchmark datasets verify the superiority of our proposed method, which outperformed other methods by a significant margin. The code is available at: https://github.com/sherry6247/DKINet.

CVJul 9, 2025
Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation

Tao Feng, Xianbing Zhao, Zhenhua Chen et al.

Recent advances in diffusion-based and autoregressive video generation models have achieved remarkable visual realism. However, these models typically lack accurate physical alignment, failing to replicate real-world dynamics in object motion. This limitation arises primarily from their reliance on learned statistical correlations rather than capturing mechanisms adhering to physical laws. To address this issue, we introduce a novel framework that integrates symbolic regression (SR) and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting. Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories. These trajectories then guide video generation without requiring fine-tuning of existing models. Evaluated on scenarios in Classical Mechanics, including spring-mass, pendulums, and projectile motions, our method successfully recovers ground-truth analytical equations and improves the physical alignment of generated videos over baseline methods.

CLFeb 16, 2025
Predicting Depression in Screening Interviews from Interactive Multi-Theme Collaboration

Xianbing Zhao, Yiqing Lyu, Di Wang et al.

Automatic depression detection provides cues for early clinical intervention by clinicians. Clinical interviews for depression detection involve dialogues centered around multiple themes. Existing studies primarily design end-to-end neural network models to capture the hierarchical structure of clinical interview dialogues. However, these methods exhibit defects in modeling the thematic content of clinical interviews: 1) they fail to capture intra-theme and inter-theme correlation explicitly, and 2) they do not allow clinicians to intervene and focus on themes of interest. To address these issues, this paper introduces an interactive depression detection framework. This framework leverages in-context learning techniques to identify themes in clinical interviews and then models both intra-theme and inter-theme correlation. Additionally, it employs AI-driven feedback to simulate the interests of clinicians, enabling interactive adjustment of theme importance. PDIMC achieves absolute improvements of 35\% and 12\% compared to the state-of-the-art on the depression detection dataset DAIC-WOZ, which demonstrates the effectiveness of modeling theme correlation and incorporating interactive external feedback.

CVJan 20, 2024
Toward Robust Multimodal Learning using Multimodal Foundational Models

Xianbing Zhao, Soujanya Poria, Xuejiao Li et al.

Existing multimodal sentiment analysis tasks are highly rely on the assumption that the training and test sets are complete multimodal data, while this assumption can be difficult to hold: the multimodal data are often incomplete in real-world scenarios. Therefore, a robust multimodal model in scenarios with randomly missing modalities is highly preferred. Recently, CLIP-based multimodal foundational models have demonstrated impressive performance on numerous multimodal tasks by learning the aligned cross-modal semantics of image and text pairs, but the multimodal foundational models are also unable to directly address scenarios involving modality absence. To alleviate this issue, we propose a simple and effective framework, namely TRML, Toward Robust Multimodal Learning using Multimodal Foundational Models. TRML employs generated virtual modalities to replace missing modalities, and aligns the semantic spaces between the generated and missing modalities. Concretely, we design a missing modality inference module to generate virtual modaliites and replace missing modalities. We also design a semantic matching learning module to align semantic spaces generated and missing modalities. Under the prompt of complete modality, our model captures the semantics of missing modalities by leveraging the aligned cross-modal semantic space. Experiments demonstrate the superiority of our approach on three multimodal sentiment analysis benchmark datasets, CMU-MOSI, CMU-MOSEI, and MELD.