Haolin Li

CV
h-index28
22papers
112citations
Novelty54%
AI Score56

22 Papers

LGApr 13Code
Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach

Haolin Li, Shuyang Jiang, Ruipeng Zhang et al.

While large language models hold promise for complex medical applications, their development is hindered by the scarcity of high-quality reasoning data. To address this issue, existing approaches typically distill chain-of-thought reasoning traces from large proprietary models via supervised fine-tuning, then conduct reinforcement learning (RL). These methods exhibit limited improvement on underrepresented domains like rare diseases while incurring substantial costs from generating complex reasoning chains. To efficiently enhance medical reasoning, we propose MedSSR, a Medical Knowledge-enhanced data Synthesis and Semi-supervised Reinforcement learning framework. Our framework first employs rare disease knowledge to synthesize distribution-controllable reasoning questions. We then utilize the policy model itself to generate high-quality pseudo-labels. This enables a two-stage, intrinsic-to-extrinsic training paradigm: self-supervised RL on the pseudo-labeled synthetic data, followed by supervised RL on the human-annotated real data. MedSSR scales model training efficiently without relying on costly trace distillation. Extensive experiments on Qwen and Llama demonstrate that our method outperforms existing methods across ten medical benchmarks, achieving up to +5.93% gain on rare-disease tasks. Our code is available at https://github.com/tdlhl/MedSSR.

CVSep 29, 2024Code
LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models

Haolin Li, Yuhang Zhou, Ziheng Zhao et al.

The widespread adoption of large-scale pre-training techniques has significantly advanced the development of medical foundation models, enabling them to serve as versatile tools across a broad range of medical tasks. However, despite their strong generalization capabilities, medical foundation models pre-trained on large-scale datasets tend to suffer from domain gaps between heterogeneous data, leading to suboptimal performance on specific tasks compared to specialist models, as evidenced by previous studies. In this paper, we explore a new perspective called "Knowledge Decomposition" to improve the performance on specific medical tasks, which deconstructs the foundation model into multiple lightweight expert models, each dedicated to a particular anatomical region, with the aim of enhancing specialization and simultaneously reducing resource consumption. To accomplish the above objective, we propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution. The low-rank expert modules resolve gradient conflicts between heterogeneous data from different anatomical regions, providing strong specialization at lower costs. The efficient knowledge separation convolution significantly improves algorithm efficiency by achieving knowledge separation within a single forward propagation. Extensive experimental results on segmentation and classification tasks demonstrate that our decomposed models not only achieve state-of-the-art performance but also exhibit superior transferability on downstream tasks, even surpassing the original foundation models in task-specific evaluations. The code is available at here.

CVJul 9, 2024
Reprogramming Distillation for Medical Foundation Models

Yuhang Zhou, Siyuan Du, Haolin Li et al.

Medical foundation models pre-trained on large-scale datasets have demonstrated powerful versatile capabilities for various tasks. However, due to the gap between pre-training tasks (or modalities) and downstream tasks (or modalities), the real-world computation and speed constraints, it might not be straightforward to apply medical foundation models in the downstream scenarios. Previous methods, such as parameter efficient fine-tuning (PEFT) methods and knowledge distillation (KD) methods, are unable to simultaneously address the task (or modality) inconsistency and achieve personalized lightweight deployment under diverse real-world demands. To address the above issues, we propose a novel framework called Reprogramming Distillation (RD). On one hand, RD reprograms the original feature space of the foundation model so that it is more relevant to downstream scenarios, aligning tasks and modalities. On the other hand, through a co-training mechanism and a shared classifier, connections are established between the reprogrammed knowledge and the knowledge of student models, ensuring that the reprogrammed feature space can be smoothly mimic by the student model of different structures. Further, to reduce the randomness under different training conditions, we design a Centered Kernel Alignment (CKA) distillation to promote robust knowledge transfer. Empirically, we show that on extensive datasets, RD consistently achieve superior performance compared with previous PEFT and KD methods.

NCMar 31
Predicting Neuromodulation Outcome for Parkinson's Disease with Generative Virtual Brain Model

Siyuan Du, Siyi Li, Shuwei Bai et al.

Parkinson's disease (PD) affects over ten million people worldwide. Although temporal interference (TI) and deep brain stimulation (DBS) are promising therapies, inter-individual variability limits empirical treatment selection, increasing non-negligible surgical risk and cost. Previous explorations either resort to limited statistical biomarkers that are insufficient to characterize variability, or employ AI-driven methods which is prone to overfitting and opacity. We bridge this gap with a pretraining-finetuning framework to predict outcomes directly from resting-state fMRI. Critically, a generative virtual brain foundation model, pretrained on a collective dataset (2707 subjects, 5621 sessions) to capture universal disorder patterns, was finetuned on PD cohorts receiving TI (n=51) or DBS (n=55) to yield individualized virtual brains with high fidelity to empirical functional connectivity (r=0.935). By constructing counterfactual estimations between pathological and healthy neural states within these personalized models, we predicted clinical responses (TI: AUPR=0.853; DBS: AUPR=0.915), substantially outperforming baselines. External and prospective validations (n=14, n=11) highlight the feasibility of clinical translation. Moreover, our framework provides state-dependent regional patterns linked to response, offering hypothesis-generating mechanistic insights.

LGJan 14
Discrete Solution Operator Learning for Geometry-Dependent PDEs

Jinshuai Bai, Haolin Li, Zahra Sharif Khodaei et al.

Neural operator learning accelerates PDE solution by approximating operators as mappings between continuous function spaces. Yet in many engineering settings, varying geometry induces discrete structural changes, including topological changes, abrupt changes in boundary conditions or boundary types, and changes in the computational domain, which break the smooth-variation premise. Here we introduce Discrete Solution Operator Learning (DiSOL), a complementary paradigm that learns discrete solution procedures rather than continuous function-space operators. DiSOL factorizes the solver into learnable stages that mirror classical discretizations: local contribution encoding, multiscale assembly, and implicit solution reconstruction on an embedded grid, thereby preserving procedure-level consistency while adapting to geometry-dependent discrete structures. Across geometry-dependent Poisson, advection-diffusion, linear elasticity, as well as spatiotemporal heat conduction problems, DiSOL produces stable and accurate predictions under both in-distribution and strongly out-of-distribution geometries, including discontinuous boundaries and topological changes. These results highlight the need for procedural operator representations in geometry-dominated problems and position discrete solution operator learning as a distinct, complementary direction in scientific machine learning.

LGJun 16, 2023
GPINN: Physics-informed Neural Network with Graph Embedding

Yuyang Miao, Haolin Li

This work proposes a Physics-informed Neural Network framework with Graph Embedding (GPINN) to perform PINN in graph, i.e. topological space instead of traditional Euclidean space, for improved problem-solving efficiency. The method integrates topological data into the neural network's computations, which significantly boosts the performance of the Physics-Informed Neural Network (PINN). The graph embedding technique infuses extra dimensions into the input space to encapsulate the spatial characteristics of a graph while preserving the properties of the original space. The selection of these extra dimensions is guided by the Fiedler vector, offering an optimised pathologic notation of the graph. Two case studies are conducted, which demonstrate significant improvement in the performance of GPINN in comparison to traditional PINN, particularly in its superior ability to capture physical features of the solution.

CVFeb 3
Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Bozhou Li, Yushuo Guan, Haolin Li et al.

Recent DiT-based text-to-image models increasingly adopt LLMs as text encoders, yet text conditioning remains largely static and often utilizes only a single LLM layer, despite pronounced semantic hierarchy across LLM layers and non-stationary denoising dynamics over both diffusion time and network depth. To better match the dynamic process of DiT generation and thereby enhance the diffusion model's generative capability, we introduce a unified normalized convex fusion framework equipped with lightweight gates to systematically organize multi-layer LLM hidden states via time-wise, depth-wise, and joint fusion. Experiments establish Depth-wise Semantic Routing as the superior conditioning strategy, consistently improving text-image alignment and compositional generation (e.g., +9.97 on the GenAI-Bench Counting task). Conversely, we find that purely time-wise fusion can paradoxically degrade visual generation fidelity. We attribute this to a train-inference trajectory mismatch: under classifier-free guidance, nominal timesteps fail to track the effective SNR, causing semantically mistimed feature injection during inference. Overall, our results position depth-wise routing as a strong and effective baseline and highlight the critical need for trajectory-aware signals to enable robust time-dependent conditioning.

LGSep 24, 2025Code
RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis

Haolin Li, Tianjie Dai, Zhe Chen et al.

Clinical diagnosis is a highly specialized discipline requiring both domain expertise and strict adherence to rigorous guidelines. While current AI-driven medical research predominantly focuses on knowledge graphs or natural text pretraining paradigms to incorporate medical knowledge, these approaches primarily rely on implicitly encoded knowledge within model parameters, neglecting task-specific knowledge required by diverse downstream tasks. To address this limitation, we propose Retrieval-Augmented Diagnosis (RAD), a novel framework that explicitly injects external knowledge into multimodal models directly on downstream tasks. Specifically, RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss that constrains the latent distance between multi-modal features and guideline knowledge, and the dual transformer decoder that employs guidelines as queries to steer cross-modal fusion, aligning the models with clinical diagnostic workflows from guideline acquisition to feature extraction and decision-making. Moreover, recognizing the lack of quantitative evaluation of interpretability for multimodal diagnostic models, we introduce a set of criteria to assess the interpretability from both image and text perspectives. Extensive evaluations across four datasets with different anatomies demonstrate RAD's generalizability, achieving state-of-the-art performance. Furthermore, RAD enables the model to concentrate more precisely on abnormal regions and critical indicators, ensuring evidence-based, trustworthy diagnosis. Our code is available at https://github.com/tdlhl/RAD.

CVJun 14, 2024Code
Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters

Yuhang Zhou, Zihua Zhao, Haolin Li et al.

Training a unified model to take multiple targets into account is a trend towards artificial general intelligence. However, how to efficiently mitigate the training conflicts among heterogeneous data collected from different domains or tasks remains under-explored. In this study, we explore to leverage Mixture of Low-rank Adapters (MoLA) to mitigate conflicts in heterogeneous data training, which requires to jointly train the multiple low-rank adapters and their shared backbone. Specifically, we introduce two variants of MoLA, namely, MoLA-Grad and MoLA-Router, to respectively handle the target-aware and target-agnostic scenarios during inference. The former uses task identifiers to assign personalized low-rank adapters to each task, disentangling task-specific knowledge towards their adapters, thereby mitigating heterogeneity conflicts. The latter uses a novel Task-wise Decorrelation (TwD) loss to intervene the router to learn oriented weight combinations of adapters to homogeneous tasks, achieving similar effects. We conduct comprehensive experiments to verify the superiority of MoLA over previous state-of-the-art methods and present in-depth analysis on its working mechanism. Source code is available at: https://github.com/MediaBrain-SJTU/MoLA

LGJan 29
Learning the Mechanism of Catastrophic Forgetting: A Perspective from Gradient Similarity

Mutian Yang, Zisen Zhan, Yutong Chen et al.

Catastrophic forgetting during knowledge injection severely undermines the continual learning capability of large language models (LLMs). Although existing methods attempt to mitigate this issue, they often lack a foundational theoretical explanation. We establish a gradient-based theoretical framework to explain catastrophic forgetting. We first prove that strongly negative gradient similarity is a fundamental cause of forgetting. We then use gradient similarity to identify two types of neurons: conflicting neurons that induce forgetting and account for 50%-75% of neurons, and collaborative neurons that mitigate forgetting and account for 25%-50%. Based on this analysis, we propose a knowledge injection method, Collaborative Neural Learning (CNL). By freezing conflicting neurons and updating only collaborative neurons, CNL theoretically eliminates catastrophic forgetting under an infinitesimal learning rate eta and an exactly known mastered set. Experiments on five LLMs, four datasets, and four optimizers show that CNL achieves zero forgetting in in-set settings and reduces forgetting by 59.1%-81.7% in out-of-set settings.

CVMay 6
Deep Reprogramming Distillation for Medical Foundation Models

Siyuan Du, Yuhang Zhou, Haolin Li et al.

Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillation (KD) assumes that teacher and student models share the same task, training strategy, and model structure family, while prevalent parameter-efficient fine-tuning (PEFT) fails to achieve personalized and lightweight deployment. Even the combination of PEFT and KD still struggles to resolve model structures and training strategies inconsistencies between teacher and student models, leading to inefficient knowledge transfer. In this study, we propose a novel framework called Deep Reprogramming Distillation (DRD) to combat the general adaptation challenge. Specifically, DRD introduces the novel reprogramming module that on the one side overcomes the domain and task discrepancy between pretraining and downstream scenarios, and on the other side builds the student-friendly efficient distillation from foundation models to lightweight downstream models. Furthermore, to mitigate variability under different training conditions, we design a centered kernel alignment (CKA) distillation method to promote robust knowledge transfer. Empirical results show that DRD surpasses previous PEFT and KD methods across 18 medical downstream tasks under different foundation models, covering various scenarios including 2D/3D classification and 2D/3D segmentation.

CVApr 26, 2024
Low-Rank Knowledge Decomposition for Medical Foundation Models

Yuhang Zhou, Haolin Li, Siyuan Du et al.

The popularity of large-scale pre-training has promoted the development of medical foundation models. However, some studies have shown that although foundation models exhibit strong general feature extraction capabilities, their performance on specific tasks is still inferior to task-specific methods. In this paper, we explore a new perspective called ``Knowledge Decomposition'' to improve the performance on specific medical tasks, which deconstruct the foundation model into multiple lightweight expert models, each dedicated to a particular task, with the goal of improving specialization while concurrently mitigating resource expenditure. To accomplish the above objective, we design a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates graidents by incorporating low-rank expert modules and the efficient knowledge separation convolution. Extensive experimental results demonstrate that the decomposed models perform well in terms of performance and transferability, even surpassing the original foundation models.

CEDec 12, 2024
Finite-PINN: A Physics-Informed Neural Network with Finite Geometric Encoding for Solid Mechanics

Haolin Li, Yuyang Miao, Zahra Sharif Khodaei et al.

PINN models have demonstrated capabilities in addressing fluid PDE problems, and their potential in solid mechanics is beginning to emerge. This study identifies two key challenges when using PINN to solve general solid mechanics problems. These challenges become evident when comparing the limitations of PINN with the well-established numerical methods commonly used in solid mechanics, such as the finite element method (FEM). Specifically: a) PINN models generate solutions over an infinite domain, which conflicts with the finite boundaries typical of most solid structures; and b) the solution space utilised by PINN is Euclidean, which is inadequate for addressing the complex geometries often present in solid structures. This work presents a PINN architecture for general solid mechanics problems, referred to as the Finite-PINN model. The model is designed to effectively tackle two key challenges, while retaining as much of the original PINN framework as possible. To this end, the Finite-PINN incorporates finite geometric encoding into the neural network inputs, thereby transforming the solution space from a conventional Euclidean space into a hybrid Euclidean-topological space. The model is comprehensively trained using both strong-form and weak-form loss formulations, enabling its application to a wide range of forward and inverse problems in solid mechanics. For forward problems, the Finite-PINN model efficiently approximates solutions to solid mechanics problems when the geometric information of a given structure has been preprocessed. For inverse problems, it effectively reconstructs full-field solutions from very sparse observations by embedding both physical laws and geometric information within its architecture.

SPOct 22, 2024
Graph Sampling for Scalable and Expressive Graph Neural Networks on Homophilic Graphs

Haolin Li, Haoyu Wang, Luana Ruiz

Graph Neural Networks (GNNs) excel in many graph machine learning tasks but face challenges when scaling to large networks. GNN transferability allows training on smaller graphs and applying the model to larger ones, but existing methods often rely on random subsampling, leading to disconnected subgraphs and reduced model expressivity. We propose a novel graph sampling algorithm that leverages feature homophily to preserve graph structure. By minimizing the trace of the data correlation matrix, our method better preserves the graph Laplacian trace -- a proxy for the graph connectivity -- than random sampling, while achieving lower complexity than spectral methods. Experiments on citation networks show improved performance in preserving Laplacian trace and GNN transferability compared to random sampling.

AIOct 27, 2025
Exploring Semantic-constrained Adversarial Example with Instruction Uncertainty Reduction

Jin Hu, Jiakai Wang, Linna Jing et al.

Recently, semantically constrained adversarial examples (SemanticAE), which are directly generated from natural language instructions, have become a promising avenue for future research due to their flexible attacking forms. To generate SemanticAEs, current methods fall short of satisfactory attacking ability as the key underlying factors of semantic uncertainty in human instructions, such as referring diversity, descriptive incompleteness, and boundary ambiguity, have not been fully investigated. To tackle the issues, this paper develops a multi-dimensional instruction uncertainty reduction (InSUR) framework to generate more satisfactory SemanticAE, i.e., transferable, adaptive, and effective. Specifically, in the dimension of the sampling method, we propose the residual-driven attacking direction stabilization to alleviate the unstable adversarial optimization caused by the diversity of language references. By coarsely predicting the language-guided sampling process, the optimization process will be stabilized by the designed ResAdv-DDIM sampler, therefore releasing the transferable and robust adversarial capability of multi-step diffusion models. In task modeling, we propose the context-encoded attacking scenario constraint to supplement the missing knowledge from incomplete human instructions. Guidance masking and renderer integration are proposed to regulate the constraints of 2D/3D SemanticAE, activating stronger scenario-adapted attacks. Moreover, in the dimension of generator evaluation, we propose the semantic-abstracted attacking evaluation enhancement by clarifying the evaluation boundary, facilitating the development of more effective SemanticAE generators. Extensive experiments demonstrate the superiority of the transfer attack performance of InSUR. Moreover, we realize the reference-free generation of semantically constrained 3D adversarial examples for the first time.

CLOct 16, 2025
LiRA: Linguistic Robust Anchoring for Cross-lingual Large Language Models

Haolin Li, Haipeng Zhang, Mang Li et al.

As large language models (LLMs) rapidly advance, performance on high-resource languages (e.g., English, Chinese) is nearing saturation, yet remains substantially lower for low-resource languages (e.g., Urdu, Thai) due to limited training data, machine-translation noise, and unstable cross-lingual alignment. We introduce LiRA (Linguistic Robust Anchoring for Large Language Models), a training framework that robustly improves cross-lingual representations under low-resource conditions while jointly strengthening retrieval and reasoning. LiRA comprises two modules: (i) Arca (Anchored Representation Composition Architecture), which anchors low-resource languages to an English semantic space via anchor-based alignment and multi-agent collaborative encoding, preserving geometric stability in a shared embedding space; and (ii) LaSR (Language-coupled Semantic Reasoner), which adds a language-aware lightweight reasoning head with consistency regularization on top of Arca's multilingual representations, unifying the training objective to enhance cross-lingual understanding, retrieval, and reasoning robustness. We further construct and release a multilingual product retrieval dataset covering five Southeast Asian and two South Asian languages. Experiments across low-resource benchmarks (cross-lingual retrieval, semantic similarity, and reasoning) show consistent gains and robustness under few-shot and noise-amplified settings; ablations validate the contribution of both Arca and LaSR. Code will be released on GitHub and the dataset on Hugging Face.

LGOct 14, 2025
FedGTEA: Federated Class-Incremental Learning with Gaussian Task Embedding and Alignment

Haolin Li, Hoda Bidkhori

We introduce a novel framework for Federated Class Incremental Learning, called Federated Gaussian Task Embedding and Alignment (FedGTEA). FedGTEA is designed to capture task-specific knowledge and model uncertainty in a scalable and communication-efficient manner. At the client side, the Cardinality-Agnostic Task Encoder (CATE) produces Gaussian-distributed task embeddings that encode task knowledge, address statistical heterogeneity, and quantify data uncertainty. Importantly, CATE maintains a fixed parameter size regardless of the number of tasks, which ensures scalability across long task sequences. On the server side, FedGTEA utilizes the 2-Wasserstein distance to measure inter-task gaps between Gaussian embeddings. We formulate the Wasserstein loss to enforce inter-task separation. This probabilistic formulation not only enhances representation learning but also preserves task-level privacy by avoiding the direct transmission of latent embeddings, aligning with the privacy constraints in federated learning. Extensive empirical evaluations on popular datasets demonstrate that FedGTEA achieves superior classification performance and significantly mitigates forgetting, consistently outperforming strong existing baselines.

CLAug 18, 2025
HeteroRAG: A Heterogeneous Retrieval-Augmented Generation Framework for Medical Vision Language Tasks

Zhe Chen, Yusheng Liao, Shuyang Jiang et al.

Medical large vision-language Models (Med-LVLMs) have shown promise in clinical applications but suffer from factual inaccuracies and unreliable outputs, posing risks in real-world diagnostics. While retrieval-augmented generation has emerged as a potential solution, current medical multimodal RAG systems are unable to perform effective retrieval across heterogeneous sources. The irrelevance of retrieved reports affects the factuality of analysis, while insufficient knowledge affects the credibility of clinical decision-making. To bridge the gap, we construct MedAtlas, which includes extensive multimodal report repositories and diverse text corpora. Based on it, we present HeteroRAG, a novel framework that enhances Med-LVLMs through heterogeneous knowledge sources. The framework introduces Modality-specific CLIPs for effective report retrieval and a Multi-corpora Query Generator for dynamically constructing queries for diverse corpora. Incorporating knowledge from such multifaceted sources, Med-LVLM is then trained with Heterogeneous Knowledge Preference Tuning to achieve cross-modality and multi-source knowledge alignment. Extensive experiments across 12 datasets and 3 modalities demonstrate that the proposed HeteroRAG achieves state-of-the-art performance in most medical vision language benchmarks, significantly improving factual accuracy and reliability of Med-LVLMs.

CVJul 11, 2025
Cross-Resolution SAR Target Detection Using Structural Hierarchy Adaptation and Reliable Adjacency Alignment

Jiang Qin, Bin Zou, Haolin Li et al.

In recent years, continuous improvements in SAR resolution have significantly benefited applications such as urban monitoring and target detection. However, the improvement in resolution leads to increased discrepancies in scattering characteristics, posing challenges to the generalization ability of target detection models. While domain adaptation technology is a potential solution, the inevitable discrepancies caused by resolution differences often lead to blind feature adaptation and unreliable semantic propagation, ultimately degrading the domain adaptation performance. To address these challenges, this paper proposes a novel SAR target detection method (termed CR-Net), that incorporates structure priors and evidential learning theory into the detection model, enabling reliable domain adaptation for cross-resolution detection. To be specific, CR-Net integrates Structure-induced Hierarchical Feature Adaptation (SHFA) and Reliable Structural Adjacency Alignment (RSAA). SHFA module is introduced to establish structural correlations between targets and achieve structure-aware feature adaptation, thereby enhancing the interpretability of the feature adaptation process. Afterwards, the RSAA module is proposed to enhance reliable semantic alignment, by leveraging the secure adjacency set to transfer valuable discriminative knowledge from the source domain to the target domain. This further improves the discriminability of the detection model in the target domain. Based on experimental results from different-resolution datasets,the proposed CR-Net significantly enhances cross-resolution adaptation by preserving intra-domain structures and improving discriminability. It achieves state-of-the-art (SOTA) performance in cross-resolution SAR target detection.

CVJun 4, 2024
3D-HGS: 3D Half-Gaussian Splatting

Haolin Li, Jinyang Liu, Mario Sznaier et al.

Photo-realistic image rendering from 3D scene reconstruction has advanced significantly with neural rendering techniques. Among these, 3D Gaussian Splatting (3D-GS) outperforms Neural Radiance Fields (NeRFs) in quality and speed but struggles with shape and color discontinuities. We propose 3D Half-Gaussian (3D-HGS) kernels as a plug-and-play solution to address these limitations. Our experiments show that 3D-HGS enhances existing 3D-GS methods, achieving state-of-the-art rendering quality without compromising speed.

LGFeb 27, 2024
RAGFormer: Learning Semantic Attributes and Topological Structure for Fraud Detection

Haolin Li, Shuyang Jiang, Lifeng Zhang et al.

Fraud detection remains a challenging task due to the complex and deceptive nature of fraudulent activities. Current approaches primarily concentrate on learning only one perspective of the graph: either the topological structure of the graph or the attributes of individual nodes. However, we conduct empirical studies to reveal that these two types of features, while nearly orthogonal, are each independently effective. As a result, previous methods can not fully capture the comprehensive characteristics of the fraud graph. To address this dilemma, we present a novel framework called Relation-Aware GNN with transFormer~(RAGFormer) which simultaneously embeds both semantic and topological features into a target node. The simple yet effective network consists of a semantic encoder, a topology encoder, and an attention fusion module. The semantic encoder utilizes Transformer to learn semantic features and node interactions across different relations. We introduce Relation-Aware GNN as the topology encoder to learn topological features and node interactions within each relation. These two complementary features are interleaved through an attention fusion module to support prediction by both orthogonal features. Extensive experiments on two popular public datasets demonstrate that RAGFormer achieves state-of-the-art performance. The significant improvement of RAGFormer in an industrial credit card fraud detection dataset further validates the applicability of our method in real-world business scenarios.

CVOct 1, 2021
Self-Supervised Decomposition, Disentanglement and Prediction of Video Sequences while Interpreting Dynamics: A Koopman Perspective

Armand Comas, Sandesh Ghimire, Haolin Li et al.

Human interpretation of the world encompasses the use of symbols to categorize sensory inputs and compose them in a hierarchical manner. One of the long-term objectives of Computer Vision and Artificial Intelligence is to endow machines with the capacity of structuring and interpreting the world as we do. Towards this goal, recent methods have successfully been able to decompose and disentangle video sequences into their composing objects and dynamics, in a self-supervised fashion. However, there has been a scarce effort in giving interpretation to the dynamics of the scene. We propose a method to decompose a video into moving objects and their attributes, and model each object's dynamics with linear system identification tools, by means of a Koopman embedding. This allows interpretation, manipulation and extrapolation of the dynamics of the different objects by employing the Koopman operator K. We test our method in various synthetic datasets and successfully forecast challenging trajectories while interpreting them.