Quan Z. Sheng

LG
h-index80
80papers
4,816citations
Novelty34%
AI Score54

80 Papers

LGMay 31, 2022Code
Graph-level Neural Networks: Current Progress and Future Directions

Ge Zhang, Jia Wu, Jian Yang et al. · allen-ai

Graph-structured data consisting of objects (i.e., nodes) and relationships among objects (i.e., edges) are ubiquitous. Graph-level learning is a matter of studying a collection of graphs instead of a single graph. Traditional graph-level learning methods used to be the mainstream. However, with the increasing scale and complexity of graphs, Graph-level Neural Networks (GLNNs, deep learning-based graph-level learning methods) have been attractive due to their superiority in modeling high-dimensional data. Thus, a survey on GLNNs is necessary. To frame this survey, we propose a systematic taxonomy covering GLNNs upon deep neural networks, graph neural networks, and graph pooling. The representative and state-of-the-art models in each category are focused on this survey. We also investigate the reproducibility, benchmarks, and new graph datasets of GLNNs. Finally, we conclude future directions to further push forward GLNNs. The repository of this survey is available at https://github.com/GeZhangMQ/Awesome-Graph-level-Neural-Networks.

LGFeb 13, 2023Code
A Comprehensive Survey on Graph Summarization with Graph Neural Networks

Nasrin Shabani, Jia Wu, Amin Beheshti et al.

As large-scale graphs become more widespread, more and more computational challenges with extracting, processing, and interpreting large graph data are being exposed. It is therefore natural to search for ways to summarize these expansive graphs while preserving their key characteristics. In the past, most graph summarization techniques sought to capture the most important part of a graph statistically. However, today, the high dimensionality and complexity of modern graph data are making deep learning techniques more popular. Hence, this paper presents a comprehensive survey of progress in deep learning summarization techniques that rely on graph neural networks (GNNs). Our investigation includes a review of the current state-of-the-art approaches, including recurrent GNNs, convolutional GNNs, graph autoencoders, and graph attention networks. A new burgeoning line of research is also discussed where graph reinforcement learning is being used to evaluate and improve the quality of graph summaries. Additionally, the survey provides details of benchmark datasets, evaluation metrics, and open-source tools that are often employed in experimentation settings, along with a detailed comparison, discussion, and takeaways for the research community focused on graph summarization. Finally, the survey concludes with a number of open research challenges to motivate further study in this area.

LGOct 18, 2022
DAGAD: Data Augmentation for Graph Anomaly Detection

Fanzhen Liu, Xiaoxiao Ma, Jia Wu et al.

Graph anomaly detection in this paper aims to distinguish abnormal nodes that behave differently from the benign ones accounting for the majority of graph-structured instances. Receiving increasing attention from both academia and industry, yet existing research on this task still suffers from two critical issues when learning informative anomalous behavior from graph data. For one thing, anomalies are usually hard to capture because of their subtle abnormal behavior and the shortage of background knowledge about them, which causes severe anomalous sample scarcity. Meanwhile, the overwhelming majority of objects in real-world graphs are normal, bringing the class imbalance problem as well. To bridge the gaps, this paper devises a novel Data Augmentation-based Graph Anomaly Detection (DAGAD) framework for attributed graphs, equipped with three specially designed modules: 1) an information fusion module employing graph neural network encoders to learn representations, 2) a graph data augmentation module that fertilizes the training set with generated samples, and 3) an imbalance-tailored learning module to discriminate the distributions of the minority (anomalous) and majority (normal) classes. A series of experiments on three datasets prove that DAGAD outperforms ten state-of-the-art baseline detectors concerning various mostly-used metrics, together with an extensive ablation study validating the strength of our proposed modules.

LGJan 14, 2023
State of the Art and Potentialities of Graph-level Learning

Zhenyu Yang, Ge Zhang, Jia Wu et al.

Graphs have a superior ability to represent relational data, like chemical compounds, proteins, and social networks. Hence, graph-level learning, which takes a set of graphs as input, has been applied to many tasks including comparison, regression, classification, and more. Traditional approaches to learning a set of graphs heavily rely on hand-crafted features, such as substructures. But while these methods benefit from good interpretability, they often suffer from computational bottlenecks as they cannot skirt the graph isomorphism problem. Conversely, deep learning has helped graph-level learning adapt to the growing scale of graphs by extracting features automatically and encoding graphs into low-dimensional representations. As a result, these deep graph learning methods have been responsible for many successes. Yet, there is no comprehensive survey that reviews graph-level learning starting with traditional learning and moving through to the deep learning approaches. This article fills this gap and frames the representative algorithms into a systematic taxonomy covering traditional learning, graph-level deep neural networks, graph-level graph neural networks, and graph pooling. To ensure a thoroughly comprehensive survey, the evolutions, interactions, and communications between methods from four different branches of development are also examined. This is followed by a brief review of the benchmark data sets, evaluation metrics, and common downstream applications. The survey concludes with a broad overview of 12 current and future directions in this booming field.

LGAug 28, 2023
Reinforcement Learning for Generative AI: A Survey

Yuanjiang Cao, Quan Z. Sheng, Julian McAuley et al.

Deep Generative AI has been a long-standing essential topic in the machine learning community, which can impact a number of application areas like text generation and computer vision. The major paradigm to train a generative model is maximum likelihood estimation, which pushes the learner to capture and approximate the target data distribution by decreasing the divergence between the model distribution and the target distribution. This formulation successfully establishes the objective of generative tasks, while it is incapable of satisfying all the requirements that a user might expect from a generative model. Reinforcement learning, serving as a competitive option to inject new training signals by creating new objectives that exploit novel signals, has demonstrated its power and flexibility to incorporate human inductive bias from multiple angles, such as adversarial learning, hand-designed rules and learned reward model to build a performant model. Thereby, reinforcement learning has become a trending research field and has stretched the limits of generative AI in both model design and application. It is reasonable to summarize and conclude advances in recent years with a comprehensive review. Although there are surveys in different application areas recently, this survey aims to shed light on a high-level review that spans a range of application areas. We provide a rigorous taxonomy in this area and make sufficient coverage on various models and applications. Notably, we also surveyed the fast-developing large language model area. We conclude this survey by showing the potential directions that might tackle the limit of current models and expand the frontiers for generative AI.

CVMar 15, 2023
Uncertainty-Aware Pedestrian Trajectory Prediction via Distributional Diffusion

Yao Liu, Zesheng Ye, Rui Wang et al.

Tremendous efforts have been put forth on predicting pedestrian trajectory with generative models to accommodate uncertainty and multi-modality in human behaviors. An individual's inherent uncertainty, e.g., change of destination, can be masked by complex patterns resulting from the movements of interacting pedestrians. However, latent variable-based generative models often entangle such uncertainty with complexity, leading to limited either latent expressivity or predictive diversity. In this work, we propose to separately model these two factors by implicitly deriving a flexible latent representation to capture intricate pedestrian movements, while integrating predictive uncertainty of individuals with explicit bivariate Gaussian mixture densities over their future locations. More specifically, we present a model-agnostic uncertainty-aware pedestrian trajectory prediction framework, parameterizing sufficient statistics for the mixture of Gaussians that jointly comprise the multi-modal trajectories. We further estimate these parameters of interest by approximating a denoising process that progressively recovers pedestrian movements from noise. Unlike previous studies, we translate the predictive stochasticity to explicit distributions, allowing it to readily generate plausible future trajectories indicating individuals' self-uncertainty. Moreover, our framework is compatible with different neural net architectures. We empirically show the performance gains over state-of-the-art even with lighter backbones, across most scenes on two public benchmarks.

IVJul 29, 2022
Beyond CNNs: Exploiting Further Inherent Symmetries in Medical Image Segmentation

Shuchao Pang, Anan Du, Mehmet A. Orgun et al.

Automatic tumor or lesion segmentation is a crucial step in medical image analysis for computer-aided diagnosis. Although the existing methods based on Convolutional Neural Networks (CNNs) have achieved the state-of-the-art performance, many challenges still remain in medical tumor segmentation. This is because, although the human visual system can detect symmetries in 2D images effectively, regular CNNs can only exploit translation invariance, overlooking further inherent symmetries existing in medical images such as rotations and reflections. To solve this problem, we propose a novel group equivariant segmentation framework by encoding those inherent symmetries for learning more precise representations. First, kernel-based equivariant operations are devised on each orientation, which allows it to effectively address the gaps of learning symmetries in existing approaches. Then, to keep segmentation networks globally equivariant, we design distinctive group layers with layer-wise symmetry constraints. Finally, based on our novel framework, extensive experiments conducted on real-world clinical data demonstrate that a Group Equivariant Res-UNet (named GER-UNet) outperforms its regular CNN-based counterpart and the state-of-the-art segmentation methods in the tasks of hepatic tumor segmentation, COVID-19 lung infection segmentation and retinal vessel detection. More importantly, the newly built GER-UNet also shows potential in reducing the sample complexity and the redundancy of filters, upgrading current segmentation CNNs and delineating organs on other medical imaging modalities.

DLSep 18, 2023
When Large Language Models Meet Citation: A Survey

Yang Zhang, Yufei Wang, Kai Wang et al.

Citations in scholarly work serve the essential purpose of acknowledging and crediting the original sources of knowledge that have been incorporated or referenced. Depending on their surrounding textual context, these citations are used for different motivations and purposes. Large Language Models (LLMs) could be helpful in capturing these fine-grained citation information via the corresponding textual context, thereby enabling a better understanding towards the literature. Furthermore, these citations also establish connections among scientific papers, providing high-quality inter-document relationships and human-constructed knowledge. Such information could be incorporated into LLMs pre-training and improve the text representation in LLMs. Therefore, in this paper, we offer a preliminary review of the mutually beneficial relationship between LLMs and citation analysis. Specifically, we review the application of LLMs for in-text citation analysis tasks, including citation classification, citation-based summarization, and citation recommendation. We then summarize the research pertinent to leveraging citation linkage knowledge to improve text representations of LLMs via citation prediction, network structure information, and inter-document relationship. We finally provide an overview of these contemporary methods and put forth potential promising avenues in combining LLMs and citation analysis for further investigation.

CLAug 4, 2023
Learning to Select the Relevant History Turns in Conversational Question Answering

Munazza Zaib, Wei Emma Zhang, Quan Z. Sheng et al.

The increasing demand for the web-based digital assistants has given a rapid rise in the interest of the Information Retrieval (IR) community towards the field of conversational question answering (ConvQA). However, one of the critical aspects of ConvQA is the effective selection of conversational history turns to answer the question at hand. The dependency between relevant history selection and correct answer prediction is an intriguing but under-explored area. The selected relevant context can better guide the system so as to where exactly in the passage to look for an answer. Irrelevant context, on the other hand, brings noise to the system, thereby resulting in a decline in the model's performance. In this paper, we propose a framework, DHS-ConvQA (Dynamic History Selection in Conversational Question Answering), that first generates the context and question entities for all the history turns, which are then pruned on the basis of similarity they share in common with the question at hand. We also propose an attention-based mechanism to re-rank the pruned terms based on their calculated weights of how useful they are in answering the question. In the end, we further aid the model by highlighting the terms in the re-ranked conversational history using a binary classification task and keeping the useful terms (predicted as 1) and ignoring the irrelevant terms (predicted as 0). We demonstrate the efficacy of our proposed framework with extensive experimental results on CANARD and QuAC -- the two popularly utilized datasets in ConvQA. We demonstrate that selecting relevant turns works better than rewriting the original question. We also investigate how adding the irrelevant history turns negatively impacts the model's performance and discuss the research challenges that demand more attention from the IR community.

CLApr 14, 2023
Keeping the Questions Conversational: Using Structured Representations to Resolve Dependency in Conversational Question Answering

Munazza Zaib, Quan Z. Sheng, Wei Emma Zhang et al.

Having an intelligent dialogue agent that can engage in conversational question answering (ConvQA) is now no longer limited to Sci-Fi movies only and has, in fact, turned into a reality. These intelligent agents are required to understand and correctly interpret the sequential turns provided as the context of the given question. However, these sequential questions are sometimes left implicit and thus require the resolution of some natural language phenomena such as anaphora and ellipsis. The task of question rewriting has the potential to address the challenges of resolving dependencies amongst the contextual turns by transforming them into intent-explicit questions. Nonetheless, the solution of rewriting the implicit questions comes with some potential challenges such as resulting in verbose questions and taking conversational aspect out of the scenario by generating self-contained questions. In this paper, we propose a novel framework, CONVSR (CONVQA using Structured Representations) for capturing and generating intermediate representations as conversational cues to enhance the capability of the QA model to better interpret the incomplete questions. We also deliberate how the strengths of this task could be leveraged in a bid to design more engaging and eloquent conversational agents. We test our model on the QuAC and CANARD datasets and illustrate by experimental results that our proposed framework achieves a better F1 score than the standard question rewriting model.

DCJul 8, 2022
A Survey on Participant Selection for Federated Learning in Mobile Networks

Behnaz Soltani, Venus Haghighi, Adnan Mahmood et al.

Federated Learning (FL) is an efficient distributed machine learning paradigm that employs private datasets in a privacy-preserving manner. The main challenges of FL is that end devices usually possess various computation and communication capabilities and their training data are not independent and identically distributed (non-IID). Due to limited communication bandwidth and unstable availability of such devices in a mobile network, only a fraction of end devices (also referred to as the participants or clients in a FL process) can be selected in each round. Hence, it is of paramount importance to utilize an efficient participant selection scheme to maximize the performance of FL including final model accuracy and training time. In this paper, we provide a review of participant selection techniques for FL. First, we introduce FL and highlight the main challenges during participant selection. Then, we review the existing studies and categorize them based on their solutions. Finally, we provide some future directions on participant selection for FL based on our analysis of the state-of-the-art in this topic area.

LGAug 26, 2023
Uncovering Promises and Challenges of Federated Learning to Detect Cardiovascular Diseases: A Scoping Literature Review

Sricharan Donkada, Seyedamin Pouriyeh, Reza M. Parizi et al.

Cardiovascular diseases (CVD) are the leading cause of death globally, and early detection can significantly improve outcomes for patients. Machine learning (ML) models can help diagnose CVDs early, but their performance is limited by the data available for model training. Privacy concerns in healthcare make it harder to acquire data to train accurate ML models. Federated learning (FL) is an emerging approach to machine learning that allows models to be trained on data from multiple sources without compromising the privacy of the individual data owners. This survey paper provides an overview of the current state-of-the-art in FL for CVD detection. We review the different FL models proposed in various papers and discuss their advantages and challenges. We also compare FL with traditional centralized learning approaches and highlight the differences in terms of model accuracy, privacy, and data distribution handling capacity. Finally, we provide a critical analysis of FL's current challenges and limitations for CVD detection and discuss potential avenues for future research. Overall, this survey paper aims to provide a comprehensive overview of the current state-of-the-art in FL for CVD detection and to highlight its potential for improving the accuracy and privacy of CVD detection models.

IRApr 17, 2023
Causal Disentangled Variational Auto-Encoder for Preference Understanding in Recommendation

Siyu Wang, Xiaocong Chen, Quan Z. Sheng et al.

Recommendation models are typically trained on observational user interaction data, but the interactions between latent factors in users' decision-making processes lead to complex and entangled data. Disentangling these latent factors to uncover their underlying representation can improve the robustness, interpretability, and controllability of recommendation models. This paper introduces the Causal Disentangled Variational Auto-Encoder (CaD-VAE), a novel approach for learning causal disentangled representations from interaction data in recommender systems. The CaD-VAE method considers the causal relationships between semantically related factors in real-world recommendation scenarios, rather than enforcing independence as in existing disentanglement methods. The approach utilizes structural causal models to generate causal representations that describe the causal relationship between latent factors. The results demonstrate that CaD-VAE outperforms existing methods, offering a promising solution for disentangling complex user behavior data in recommendation systems.

CLDec 5, 2022
Building Metadata Inference Using a Transducer Based Language Model

David Waterworth, Subbu Sethuvenkatraman, Quan Z. Sheng

Solving the challenges of automatic machine translation of Building Automation System text metadata is a crucial first step in efficiently deploying smart building applications. The vocabulary used to describe building metadata appears small compared to general natural languages, but each term has multiple commonly used abbreviations. Conventional machine learning techniques are inefficient since they need to learn many different forms for the same word, and large amounts of data must be used to train these models. It is also difficult to apply standard techniques such as tokenisation since this commonly results in multiple output tags being associated with a single input token, something traditional sequence labelling models do not allow. Finite State Transducers can model sequence-to-sequence tasks where the input and output sequences are different lengths, and they can be combined with language models to ensure a valid output sequence is generated. We perform a preliminary analysis into the use of transducer-based language models to parse and normalise building point metadata.

CVMar 7, 2023
Guided Image-to-Image Translation by Discriminator-Generator Communication

Yuanjiang Cao, Lina Yao, Le Pan et al.

The goal of Image-to-image (I2I) translation is to transfer an image from a source domain to a target domain, which has recently drawn increasing attention. One major branch of this research is to formulate I2I translation based on Generative Adversarial Network (GAN). As a zero-sum game, GAN can be reformulated as a Partially-observed Markov Decision Process (POMDP) for generators, where generators cannot access full state information of their environments. This formulation illustrates the information insufficiency in the GAN training. To mitigate this problem, we propose to add a communication channel between discriminators and generators. We explore multiple architecture designs to integrate the communication mechanism into the I2I translation framework. To validate the performance of the proposed approach, we have conducted extensive experiments on various benchmark datasets. The experimental results confirm the superiority of our proposed method.

CLJul 11, 2023
Separate-and-Aggregate: A Transformer-based Patch Refinement Model for Knowledge Graph Completion

Chen Chen, Yufei Wang, Yang Zhang et al.

Knowledge graph completion (KGC) is the task of inferencing missing facts from any given knowledge graphs (KG). Previous KGC methods typically represent knowledge graph entities and relations as trainable continuous embeddings and fuse the embeddings of the entity $h$ (or $t$) and relation $r$ into hidden representations of query $(h, r, ?)$ (or $(?, r, t$)) to approximate the missing entities. To achieve this, they either use shallow linear transformations or deep convolutional modules. However, the linear transformations suffer from the expressiveness issue while the deep convolutional modules introduce unnecessary inductive bias, which could potentially degrade the model performance. Thus, we propose a novel Transformer-based Patch Refinement Model (PatReFormer) for KGC. PatReFormer first segments the embedding into a sequence of patches and then employs cross-attention modules to allow bi-directional embedding feature interaction between the entities and relations, leading to a better understanding of the underlying KG. We conduct experiments on four popular KGC benchmarks, WN18RR, FB15k-237, YAGO37 and DB100K. The experimental results show significant performance improvement from existing KGC methods on standard KGC evaluation metrics, e.g., MRR and H@n. Our analysis first verifies the effectiveness of our model design choices in PatReFormer. We then find that PatReFormer can better capture KG information from a large relation embedding dimension. Finally, we demonstrate that the strength of PatReFormer is at complex relation types, compared to other KGC models

LGJan 27
Generalizable IoT Traffic Representations for Cross-Network Device Identification

Arunan Sivanathan, David Warren, Deepak Mishra et al.

Machine learning models have demonstrated strong performance in classifying network traffic and identifying Internet-of-Things (IoT) devices, enabling operators to discover and manage IoT assets at scale. However, many existing approaches rely on end-to-end supervised pipelines or task-specific fine-tuning, resulting in traffic representations that are tightly coupled to labeled datasets and deployment environments, which can limit generalizability. In this paper, we study the problem of learning generalizable traffic representations for IoT device identification. We design compact encoder architectures that learn per-flow embeddings from unlabeled IoT traffic and evaluate them using a frozen-encoder protocol with a simple supervised classifier. Our specific contributions are threefold. (1) We develop unsupervised encoder--decoder models that learn compact traffic representations from unlabeled IoT network flows and assess their quality through reconstruction-based analysis. (2) We show that these learned representations can be used effectively for IoT device-type classification using simple, lightweight classifiers trained on frozen embeddings. (3) We provide a systematic benchmarking study against the state-of-the-art pretrained traffic encoders, showing that larger models do not necessarily yield more robust representations for IoT traffic. Using more than 18 million real IoT traffic flows collected across multiple years and deployment environments, we learn traffic representations from unlabeled data and evaluate device-type classification on disjoint labeled subsets, achieving macro F1-scores exceeding 0.9 for device-type classification and demonstrating robustness under cross-environment deployment.

CVFeb 5
Unlocking Prototype Potential: An Efficient Tuning Framework for Few-Shot Class-Incremental Learning

Shengqin Jiang, Xiaoran Feng, Yuankai Qi et al.

Few-shot class-incremental learning (FSCIL) seeks to continuously learn new classes from very limited samples while preserving previously acquired knowledge. Traditional methods often utilize a frozen pre-trained feature extractor to generate static class prototypes, which suffer from the inherent representation bias of the backbone. While recent prompt-based tuning methods attempt to adapt the backbone via minimal parameter updates, given the constraint of extreme data scarcity, the model's capacity to assimilate novel information and substantively enhance its global discriminative power is inherently limited. In this paper, we propose a novel shift in perspective: freezing the feature extractor while fine-tuning the prototypes. We argue that the primary challenge in FSCIL is not feature acquisition, but rather the optimization of decision regions within a static, high-quality feature space. To this end, we introduce an efficient prototype fine-tuning framework that evolves static centroids into dynamic, learnable components. The framework employs a dual-calibration method consisting of class-specific and task-aware offsets. These components function synergistically to improve the discriminative capacity of prototypes for ongoing incremental classes. Extensive results demonstrate that our method attains superior performance across multiple benchmarks while requiring minimal learnable parameters.

LGJul 8, 2022
GCN-based Multi-task Representation Learning for Anomaly Detection in Attributed Networks

Venus Haghighi, Behnaz Soltani, Adnan Mahmood et al.

Anomaly detection in attributed networks has received a considerable attention in recent years due to its applications in a wide range of domains such as finance, network security, and medicine. Traditional approaches cannot be adopted on attributed networks' settings to solve the problem of anomaly detection. The main limitation of such approaches is that they inherently ignore the relational information between data features. With a rapid explosion in deep learning- and graph neural networks-based techniques, spotting rare objects on attributed networks has significantly stepped forward owing to the potentials of deep techniques in extracting complex relationships. In this paper, we propose a new architecture on anomaly detection. The main goal of designing such an architecture is to utilize multi-task learning which would enhance the detection performance. Multi-task learning-based anomaly detection is still in its infancy and only a few studies in the existing literature have catered to the same. We incorporate both community detection and multi-view representation learning techniques for extracting distinct and complementary information from attributed networks and subsequently fuse the captured information for achieving a better detection result. The mutual collaboration between two main components employed in this architecture, i.e., community-specific learning and multi-view representation learning, exhibits a promising solution to reach more effective results.

LGAug 16, 2024
The Power of Bias: Optimizing Client Selection in Federated Learning with Heterogeneous Differential Privacy

Jiating Ma, Yipeng Zhou, Qi Li et al.

To preserve the data privacy, the federated learning (FL) paradigm emerges in which clients only expose model gradients rather than original data for conducting model training. To enhance the protection of model gradients in FL, differentially private federated learning (DPFL) is proposed which incorporates differentially private (DP) noises to obfuscate gradients before they are exposed. Yet, an essential but largely overlooked problem in DPFL is the heterogeneity of clients' privacy requirement, which can vary significantly between clients and extremely complicates the client selection problem in DPFL. In other words, both the data quality and the influence of DP noises should be taken into account when selecting clients. To address this problem, we conduct convergence analysis of DPFL under heterogeneous privacy, a generic client selection strategy, popular DP mechanisms and convex loss. Based on convergence analysis, we formulate the client selection problem to minimize the value of loss function in DPFL with heterogeneous privacy, which is a convex optimization problem and can be solved efficiently. Accordingly, we propose the DPFL-BCS (biased client selection) algorithm. The extensive experiment results with real datasets under both convex and non-convex loss functions indicate that DPFL-BCS can remarkably improve model utility compared with the SOTA baselines.

CVNov 15, 2025
Teaching Prompts to Coordinate: Hierarchical Layer-Grouped Prompt Tuning for Continual Learning

Shengqin Jiang, Tianqi Kong, Yuankai Qi et al.

Prompt-based continual learning methods fine-tune only a small set of additional learnable parameters while keeping the pre-trained model's parameters frozen. It enables efficient adaptation to new tasks while mitigating the risk of catastrophic forgetting. These methods typically attach one independent task-specific prompt to each layer of pre-trained models to locally modulate its features, ensuring that the layer's representation aligns with the requirements of the new task. However, although introducing learnable prompts independently at each layer provides high flexibility for adapting to new tasks, this overly flexible tuning could make certain layers susceptible to unnecessary updates. As all prompts till the current task are added together as a final prompt for all seen tasks, the model may easily overwrite feature representations essential to previous tasks, which increases the risk of catastrophic forgetting. To address this issue, we propose a novel hierarchical layer-grouped prompt tuning method for continual learning. It improves model stability in two ways: (i) Layers in the same group share roughly the same prompts, which are adjusted by position encoding. This helps preserve the intrinsic feature relationships and propagation pathways of the pre-trained model within each group. (ii) It utilizes a single task-specific root prompt to learn to generate sub-prompts for each layer group. In this way, all sub-prompts are conditioned on the same root prompt, enhancing their synergy and reducing independence. Extensive experiments across four benchmarks demonstrate that our method achieves favorable performance compared with several state-of-the-art methods.

LGOct 12, 2023
LGL-BCI: A Motor-Imagery-Based Brain-Computer Interface with Geometric Learning

Jianchao Lu, Yuzhe Tian, Yang Zhang et al.

Brain--computer interfaces are groundbreaking technology whereby brain signals are used to control external devices. Despite some advances in recent years, electroencephalogram (EEG)-based motor-imagery tasks face challenges, such as amplitude and phase variability and complex spatial correlations, with a need for smaller models and faster inference. In this study, we develop a prototype, called the Lightweight Geometric Learning Brain--Computer Interface (LGL-BCI), which uses our customized geometric deep learning architecture for swift model inference without sacrificing accuracy. LGL-BCI contains an EEG channel selection module via a feature decomposition algorithm to reduce the dimensionality of a symmetric positive definite matrix, providing adaptiveness among the continuously changing EEG signal. Meanwhile, a built-in lossless transformation helps boost the inference speed. The performance of our solution was evaluated using two real-world EEG devices and two public EEG datasets. LGL-BCI demonstrated significant improvements, achieving an accuracy of 82.54% compared to 62.22% for the state-of-the-art approach. Furthermore, LGL-BCI uses fewer parameters (64.9K vs. 183.7K), highlighting its computational efficiency. These findings underscore both the superior accuracy and computational efficiency of LGL-BCI, demonstrating the feasibility and robustness of geometric deep learning in motor-imagery brain--computer interface applications.

66.9CLApr 19
Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation

Elaf Alhazmi, Quan Z. Sheng, Wei Emma Zhang

Distractor generation (DG) remains a labor-intensive task that still significantly depends on domain experts. The task focuses on generating plausible yet incorrect options, known as distractors, for multiple-choice questions. A reliable distractor must be contextually relevant to the question and able to mislead examinees through implicit reasoning when identifying the correct answer. While a recent method integrates fine-tuning pre-trained encoder-decoder models with contrastive learning to generate semantically relevant distractors for a given question-answer, it often fails to capture the underlying reasoning process that experts utilize when selecting distractors in benchmarks. In this paper, we explore large language models (LLMs) reasoning for DG through in-context learning with unsupervised semantic retrieval for selecting few-shot examples. We design a rationale-augmented DG framework that jointly generates distractors and their rationales for a given question-answer. Extensive experiments on six benchmarks, with varying average distractor lengths and domains, demonstrate that prompting LLMs with few-shot examples substantially improves the performance compared to recent DG models. It outperforms recent approaches and achieves state-of-the-art results in generating reasoned distractors that align with human-labeled benchmarks.

CVJan 5
Parameter-Efficient Domain Adaption for CSI Crowd-Counting via Self-Supervised Learning with Adapter Modules

Oliver Custance, Saad Khan, Simon Parkinson et al.

Device-free crowd-counting using WiFi Channel State Information (CSI) is a key enabling technology for a new generation of privacy-preserving Internet of Things (IoT) applications. However, practical deployment is severely hampered by the domain shift problem, where models trained in one environment fail to generalise to another. To overcome this, we propose a novel two-stage framework centred on a CSI-ResNet-A architecture. This model is pre-trained via self-supervised contrastive learning to learn domain-invariant representations and leverages lightweight Adapter modules for highly efficient fine-tuning. The resulting event sequence is then processed by a stateful counting machine to produce a final, stable occupancy estimate. We validate our framework extensively. On our WiFlow dataset, our unsupervised approach excels in a 10-shot learning scenario, achieving a final Mean Absolute Error (MAE) of just 0.44--a task where supervised baselines fail. To formally quantify robustness, we introduce the Generalisation Index (GI), on which our model scores near-perfectly, confirming its ability to generalise. Furthermore, our framework sets a new state-of-the-art public WiAR benchmark with 98.8\% accuracy. Our ablation studies reveal the core strength of our design: adapter-based fine-tuning achieves performance within 1\% of a full fine-tune (98.84\% vs. 99.67\%) while training 97.2\% fewer parameters. Our work provides a practical and scalable solution for developing robust sensing systems ready for real-world IoT deployments.

LGDec 14, 2025Code
Plug-and-Play Parameter-Efficient Tuning of Embeddings for Federated Recommendation

Haochen Yuan, Yang Zhang, Xiang He et al.

With the rise of cloud-edge collaboration, recommendation services are increasingly trained in distributed environments. Federated Recommendation (FR) enables such multi-end collaborative training while preserving privacy by sharing model parameters instead of raw data. However, the large number of parameters, primarily due to the massive item embeddings, significantly hampers communication efficiency. While existing studies mainly focus on improving the efficiency of FR models, they largely overlook the issue of embedding parameter overhead. To address this gap, we propose a FR training framework with Parameter-Efficient Fine-Tuning (PEFT) based embedding designed to reduce the volume of embedding parameters that need to be transmitted. Our approach offers a lightweight, plugin-style solution that can be seamlessly integrated into existing FR methods. In addition to incorporating common PEFT techniques such as LoRA and Hash-based encoding, we explore the use of Residual Quantized Variational Autoencoders (RQ-VAE) as a novel PEFT strategy within our framework. Extensive experiments across various FR model backbones and datasets demonstrate that our framework significantly reduces communication overhead while improving accuracy. The source code is available at https://github.com/young1010/FedPEFT.

29.8CLApr 14
AgenticAI-DialogGen: Topic-Guided Conversation Generation for Fine-Tuning and Evaluating Short- and Long-Term Memories of LLMs

Manoj Madushanka Perera, Adnan Mahmood, Kasun Eranda Wijethilake et al.

Recent advancements in Large Language Models (LLMs) have improved their ability to process extended conversational contexts, yet fine-tuning and evaluating short- and long-term memories remain difficult due to the absence of datasets that encode both short- and long-term conversational history. Existing conversational datasets lack memory grounding, overlook topic continuity, or rely on costly human annotation. To address these gaps, we introduce AgenticAI-DialogGen, a modular agent-based framework that generates persona-grounded and topic-guided conversations without human supervision. The framework uses LLM agents to extract knowledge graphs, identify topics, build speaker personas, and simulate topic-guided conversations from unstructured conversations. A QA module generates memory-grounded Question Answer (QA) pairs drawn from short- and long-term conversational histories. We also generated a new dataset entitled, TopicGuidedChat (TGC), where long-term memory is encoded as speaker-specific knowledge graphs and short-term memory as newly generated topic-guided conversations. Evaluations depict that AgenticAI-DialogGen yields higher conversational quality and LLMs fine-tuned on TGC dataset achieve improved performance on memory-grounded QA tasks.

LGMay 11, 2025Code
MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning

Lishan Yang, Wei Emma Zhang, Quan Z. Sheng et al.

In the era of big data, data mining has become indispensable for uncovering hidden patterns and insights from vast and complex datasets. The integration of multimodal data sources further enhances its potential. Multimodal Federated Learning (MFL) is a distributed approach that enhances the efficiency and quality of multimodal learning, ensuring collaborative work and privacy protection. However, missing modalities pose a significant challenge in MFL, often due to data quality issues or privacy policies across the clients. In this work, we present MMiC, a framework for Mitigating Modality incompleteness in MFL within the Clusters. MMiC replaces partial parameters within client models inside clusters to mitigate the impact of missing modalities. Furthermore, it leverages the Banzhaf Power Index to optimize client selection under these conditions. Finally, MMiC employs an innovative approach to dynamically control global aggregation by utilizing Markovitz Portfolio Optimization. Extensive experiments demonstrate that MMiC consistently outperforms existing federated learning architectures in both global and personalized performance on multimodal datasets with missing modalities, confirming the effectiveness of our proposed solution. Our code is available at https://github.com/gotobcn8/MMiC.

LGJun 1, 2024Code
Graph Neural Networks for Brain Graph Learning: A Survey

Xuexiong Luo, Jia Wu, Jian Yang et al.

Exploring the complex structure of the human brain is crucial for understanding its functionality and diagnosing brain disorders. Thanks to advancements in neuroimaging technology, a novel approach has emerged that involves modeling the human brain as a graph-structured pattern, with different brain regions represented as nodes and the functional relationships among these regions as edges. Moreover, graph neural networks (GNNs) have demonstrated a significant advantage in mining graph-structured data. Developing GNNs to learn brain graph representations for brain disorder analysis has recently gained increasing attention. However, there is a lack of systematic survey work summarizing current research methods in this domain. In this paper, we aim to bridge this gap by reviewing brain graph learning works that utilize GNNs. We first introduce the process of brain graph modeling based on common neuroimaging data. Subsequently, we systematically categorize current works based on the type of brain graph generated and the targeted research problems. To make this research accessible to a broader range of interested researchers, we provide an overview of representative methods and commonly used datasets, along with their implementation sources. Finally, we present our insights on future research directions. The repository of this survey is available at \url{https://github.com/XuexiongLuoMQ/Awesome-Brain-Graph-Learning-with-GNNs}.

LGJun 14, 2021Code
A Comprehensive Survey on Graph Anomaly Detection with Deep Learning

Xiaoxiao Ma, Jia Wu, Shan Xue et al.

Anomalies represent rare observations (e.g., data records or events) that deviate significantly from others. Over several decades, research on anomaly mining has received increasing interests due to the implications of these occurrences in a wide range of disciplines. Anomaly detection, which aims to identify rare observations, is among the most vital tasks in the world, and has shown its power in preventing detrimental events, such as financial fraud, network intrusion, and social spam. The detection task is typically solved by identifying outlying data points in the feature space and inherently overlooks the relational information in real-world data. Graphs have been prevalently used to represent the structural information, which raises the graph anomaly detection problem - identifying anomalous graph objects (i.e., nodes, edges and sub-graphs) in a single graph, or anomalous graphs in a database/set of graphs. However, conventional anomaly detection techniques cannot tackle this problem well because of the complexity of graph data. For the advent of deep learning, graph anomaly detection with deep learning has received a growing attention recently. In this survey, we aim to provide a systematic and comprehensive review of the contemporary deep learning techniques for graph anomaly detection. We compile open-sourced implementations, public datasets, and commonly-used evaluation metrics to provide affluent resources for future studies. More importantly, we highlight twelve extensive future research directions according to our survey results covering unsolved and emerging research problems and real-world applications. With this survey, our goal is to create a "one-stop-shop" that provides a unified understanding of the problem categories and existing approaches, publicly available hands-on resources, and high-impact open challenges for graph anomaly detection using deep learning.

SIMay 26, 2021Code
A Comprehensive Survey on Community Detection with Deep Learning

Xing Su, Shan Xue, Fanzhen Liu et al.

A community reveals the features and connections of its members that are different from those in other communities in a network. Detecting communities is of great significance in network analysis. Despite the classical spectral clustering and statistical inference methods, we notice a significant development of deep learning techniques for community detection in recent years with their advantages in handling high dimensional network data. Hence, a comprehensive overview of community detection's latest progress through deep learning is timely to academics and practitioners. This survey devises and proposes a new taxonomy covering different state-of-the-art methods, including deep learning-based models upon deep neural networks, deep nonnegative matrix factorization and deep sparse filtering. The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders. The survey also summarizes the popular benchmark data sets, evaluation metrics, and open-source implementations to address experimentation settings. We then discuss the practical applications of community detection in various domains and point to implementation scenarios. Finally, we outline future directions by suggesting challenging topics in this fast-growing deep learning field.

LGAug 18, 2024
Mitigating Noise Detriment in Differentially Private Federated Learning with Model Pre-training

Huitong Jin, Yipeng Zhou, Quan Z. Sheng et al.

Differentially Private Federated Learning (DPFL) strengthens privacy protection by perturbing model gradients with noise, though at the cost of reduced accuracy. Although prior empirical studies indicate that initializing from pre-trained rather than random parameters can alleviate noise disturbance, the problem of optimally fine-tuning pre-trained models in DPFL remains unaddressed. In this paper, we propose Pretrain-DPFL, a framework that systematically evaluates three most representative fine-tuning strategies: full-tuning (FT), head-tuning (HT), and unified-tuning(UT) combining HT followed by FT. Through convergence analysis under smooth non-convex loss, we establish theoretical conditions for identifying the optimal fine-tuning strategy in Pretrain-DPFL, thereby maximizing the benefits of pre-trained models in mitigating noise disturbance. Extensive experiments across multiple datasets demonstrate Pretrain-DPFL's superiority, achieving $25.22\%$ higher accuracy than scratch training and outperforming the second-best baseline by $8.19\%$, significantly improving the privacy-utility trade-off in DPFL.

CLFeb 2, 2024
Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation

Elaf Alhazmi, Quan Z. Sheng, Wei Emma Zhang et al.

The distractor generation task focuses on generating incorrect but plausible options for objective questions such as fill-in-the-blank and multiple-choice questions. This task is widely utilized in educational settings across various domains and subjects. The effectiveness of these questions in assessments relies on the quality of the distractors, as they challenge examinees to select the correct answer from a set of misleading options. The evolution of artificial intelligence (AI) has transitioned the task from traditional methods to the use of neural networks and pre-trained language models. This shift has established new benchmarks and expanded the use of advanced deep learning methods in generating distractors. This survey explores distractor generation tasks, datasets, methods, and current evaluation metrics for English objective questions, covering both text-based and multi-modal domains. It also evaluates existing AI models and benchmarks and discusses potential future research directions.

71.5IRApr 30
Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

Xihang Wang, Zihan Wang, Chengkai Huang et al.

Multimodal Retrieval-Augmented Generation (MRAG) is widely adopted for Multimodal Large Language Models (MLLMs) with external evidence to reduce hallucinations. Despite its success, most existing MRAG frameworks treat retrieved evidence as indivisible documents, implicitly assuming that all content within a document is equally informative. In practice, however, sometimes only a small fraction of a document is relevant to a given query, while the remaining content introduces substantial noise that may lead to performance degradation. We address this fundamental limitation by reframing MRAG as a fine-grained evidence selection problem. We propose Fragment-level Evidence Selection for RAG (FES-RAG), a framework that selects atomic multimodal fragments rather than entire documents as grounding evidence. FES-RAG decomposes retrieved multimodal documents into sentence-level textual fragments and region-level visual fragments, enabling precise identification of evidence that directly supports generation. To guide fragment selection, we introduce Fragment Information Gain (FIG), a principled metric that measures the marginal contribution of each fragment to the MLLM's generation confidence. Based on FIG, we distill fragment-level utility judgments from a high-capacity MLLM into a lightweight selector, achieving accurate evidence selection with low inference overhead. Experiments on the M2RAG benchmark show that FES-RAG consistently outperforms state-of-the-art document-level MRAG methods, achieving up to 27 percent relative improvement in CIDEr. By selecting fewer yet more informative fragments, our approach substantially reduces context length while improving factual accuracy and generation coherence.

30.7CLApr 27
MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

Xihang Wang, Zihan Wang, Chengkai Huang et al.

Multimodal Retrieval-Augmented Generation (MRAG) addresses key limitations of Multimodal Large Language Models (MLLMs), such as hallucination and outdated knowledge. However, current MRAG systems struggle to distinguish whether retrieved multimodal data truly supports the semantic core of an answer or merely provides superficial relevance. Existing metrics often rely on heuristic position-based confidence, which fails to capture the informational density of multimodal entities. To address this, we propose Multi-modal Evidence Grounding (MEG), a semantic-aware metric that quantifies the contribution of retrieved evidence. Unlike standard confidence measures, MEG utilizes Semantic Certainty Anchoring, focusing on high-IDF information-bearing tokens that better capture the semantic core of the answer. Building on MEG, we introduce MEG-RAG, a framework that trains a multimodal reranker to align retrieved evidence with the semantic anchors of the ground truth. By prioritizing high-value content based on semantic grounding rather than token probability distributions, MEG-RAG improves the accuracy and multimodal consistency of generated outputs. Extensive experiments on the M$^2$RAG benchmark show that MEG-RAG consistently outperforms strong baselines and demonstrates robust generalization across different teacher models.

LGAug 31, 2025
Fairness in Federated Learning: Trends, Challenges, and Opportunities

Noorain Mukhtiar, Adnan Mahmood, Quan Z. Sheng

At the intersection of the cutting-edge technologies and privacy concerns, Federated Learning (FL) with its distributed architecture, stands at the forefront in a bid to facilitate collaborative model training across multiple clients while preserving data privacy. However, the applicability of FL systems is hindered by fairness concerns arising from numerous sources of heterogeneity that can result in biases and undermine a system's effectiveness, with skewed predictions, reduced accuracy, and inefficient model convergence. This survey thus explores the diverse sources of bias, including but not limited to, data, client, and model biases, and thoroughly discusses the strengths and limitations inherited within the array of the state-of-the-art techniques utilized in the literature to mitigate such disparities in the FL training process. We delineate a comprehensive overview of the several notions, theoretical underpinnings, and technical aspects associated with fairness and their adoption in FL-based multidisciplinary environments. Furthermore, we examine salient evaluation metrics leveraged to measure fairness quantitatively. Finally, we envisage exciting open research directions that have the potential to drive future advancements in achieving fairer FL frameworks, in turn, offering a strong foundation for future research in this pivotal area.

CVMay 12, 2024
Modeling Pedestrian Intrinsic Uncertainty for Multimodal Stochastic Trajectory Prediction via Energy Plan Denoising

Yao Liu, Quan Z. Sheng, Lina Yao

Pedestrian trajectory prediction plays a pivotal role in the realms of autonomous driving and smart cities. Despite extensive prior research employing sequence and generative models, the unpredictable nature of pedestrians, influenced by their social interactions and individual preferences, presents challenges marked by uncertainty and multimodality. In response, we propose the Energy Plan Denoising (EPD) model for stochastic trajectory prediction. EPD initially provides a coarse estimation of the distribution of future trajectories, termed the Plan, utilizing the Langevin Energy Model. Subsequently, it refines this estimation through denoising via the Probabilistic Diffusion Model. By initiating denoising with the Plan, EPD effectively reduces the need for iterative steps, thereby enhancing efficiency. Furthermore, EPD differs from conventional approaches by modeling the distribution of trajectories instead of individual trajectories. This allows for the explicit modeling of pedestrian intrinsic uncertainties and eliminates the need for multiple denoising operations. A single denoising operation produces a distribution from which multiple samples can be drawn, significantly enhancing efficiency. Moreover, EPD's fine-tuning of the Plan contributes to improved model performance. We validate EPD on two publicly available datasets, where it achieves state-of-the-art results. Additionally, ablation experiments underscore the contributions of individual modules, affirming the efficacy of the proposed approach.

CVMay 11, 2024
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning

Yunchuan Ma, Laiyun Qing, Guorong Li et al.

Despite the significant progress of fully-supervised video captioning, zero-shot methods remain much less explored. In this paper, we propose a novel zero-shot video captioning framework named Retrieval-Enhanced Test-Time Adaptation (RETTA), which takes advantage of existing pretrained large-scale vision and language models to directly generate captions with test-time adaptation. Specifically, we bridge video and text using four key models: a general video-text retrieval model XCLIP, a general image-text matching model CLIP, a text alignment model AnglE, and a text generation model GPT-2, due to their source-code availability. The main challenge is how to enable the text generation model to be sufficiently aware of the content in a given video so as to generate corresponding captions. To address this problem, we propose using learnable tokens as a communication medium among these four frozen models GPT-2, XCLIP, CLIP, and AnglE. Different from the conventional way that trains these tokens with training data, we propose to learn these tokens with soft targets of the inference data under several carefully crafted loss functions, which enable the tokens to absorb video information catered for GPT-2. This procedure can be efficiently done in just a few iterations (we use 16 iterations in the experiments) and does not require ground truth data. Extensive experimental results on three widely used datasets, MSR-VTT, MSVD, and VATEX, show absolute 5.1%-32.4% improvements in terms of the main metric CIDEr compared to several state-of-the-art zero-shot video captioning methods.

DCApr 27, 2025
Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers

Shuang Wang, He Zhang, Tianxing Wu et al.

Worldwide, Geo-distributed Data Centers (GDCs) provide computing and storage services for massive workflow applications, resulting in high electricity costs that vary depending on geographical locations and time. How to reduce electricity costs while satisfying the deadline constraints of workflow applications is important in GDCs, which is determined by the execution time of servers, power, and electricity price. Determining the completion time of workflows with different server frequencies can be challenging, especially in scenarios with heterogeneous computing resources in GDCs. Moreover, the electricity price is also different in geographical locations and may change dynamically. To address these challenges, we develop a geo-distributed system architecture and propose an Electricity Cost aware Multiple Workflows Scheduling algorithm (ECMWS) for servers of GDCs with fixed frequency and power. ECMWS comprises four stages, namely workflow sequencing, deadline partitioning, task sequencing, and resource allocation where two graph embedding models and a policy network are constructed to solve the Markov Decision Process (MDP). After statistically calibrating parameters and algorithm components over a comprehensive set of workflow instances, the proposed algorithms are compared with the state-of-the-art methods over two types of workflow instances. The experimental results demonstrate that our proposed algorithm significantly outperforms other algorithms, achieving an improvement of over 15\% while maintaining an acceptable computational time. The source codes are available at https://gitee.com/public-artifacts/ecmws-experiments.

LGSep 25, 2025
FedCLF -- Towards Efficient Participant Selection for Federated Learning in Heterogeneous IoV Networks

Kasun Eranda Wijethilake, Adnan Mahmood, Quan Z. Sheng

Federated Learning (FL) is a distributed machine learning technique that preserves data privacy by sharing only the trained parameters instead of the client data. This makes FL ideal for highly dynamic, heterogeneous, and time-critical applications, in particular, the Internet of Vehicles (IoV) networks. However, FL encounters considerable challenges in such networks owing to the high data and device heterogeneity. To address these challenges, we propose FedCLF, i.e., FL with Calibrated Loss and Feedback control, which introduces calibrated loss as a utility in the participant selection process and a feedback control mechanism to dynamically adjust the sampling frequency of the clients. The envisaged approach (a) enhances the overall model accuracy in case of highly heterogeneous data and (b) optimizes the resource utilization for resource constrained IoV networks, thereby leading to increased efficiency in the FL process. We evaluated FedCLF vis-à-vis baseline models, i.e., FedAvg, Newt, and Oort, using CIFAR-10 dataset with varying data heterogeneity. Our results depict that FedCLF significantly outperforms the baseline models by up to a 16% improvement in high data heterogeneity-related scenarios with improved efficiency via reduced sampling frequency.

CLSep 22, 2025
Towards Adaptive Context Management for Intelligent Conversational Question Answering

Manoj Madushanka Perera, Adnan Mahmood, Kasun Eranda Wijethilake et al.

This particular paper introduces an Adaptive Context Management (ACM) framework for the Conversational Question Answering (ConvQA) systems. The key objective of the ACM framework is to optimize the use of the conversation history by dynamically managing context for maximizing the relevant information provided to a ConvQA model within its token limit. Our approach incorporates a Context Manager (CM) Module, a Summarization (SM) Module, and an Entity Extraction (EE) Module in a bid to handle the conversation history efficaciously. The CM Module dynamically adjusts the context size, thereby preserving the most relevant and recent information within a model's token limit. The SM Module summarizes the older parts of the conversation history via a sliding window. When the summarization window exceeds its limit, the EE Module identifies and retains key entities from the oldest conversation turns. Experimental results demonstrate the effectiveness of our envisaged framework in generating accurate and contextually appropriate responses, thereby highlighting the potential of the ACM framework to enhance the robustness and scalability of the ConvQA systems.

LGSep 24, 2025
FairEquityFL -- A Fair and Equitable Client Selection in Federated Learning for Heterogeneous IoV Networks

Fahmida Islam, Adnan Mahmood, Noorain Mukhtiar et al.

Federated Learning (FL) has been extensively employed for a number of applications in machine learning, i.e., primarily owing to its privacy preserving nature and efficiency in mitigating the communication overhead. Internet of Vehicles (IoV) is one of the promising applications, wherein FL can be utilized to train a model more efficiently. Since only a subset of the clients can participate in each FL training round, challenges arise pertinent to fairness in the client selection process. Over the years, a number of researchers from both academia and industry have proposed numerous FL frameworks. However, to the best of our knowledge, none of them have employed fairness for FL-based client selection in a dynamic and heterogeneous IoV environment. Accordingly, in this paper, we envisage a FairEquityFL framework to ensure an equitable opportunity for all the clients to participate in the FL training process. In particular, we have introduced a sampling equalizer module within the selector component for ensuring fairness in terms of fair collaboration opportunity for all the clients in the client selection process. The selector is additionally responsible for both monitoring and controlling the clients' participation in each FL training round. Moreover, an outlier detection mechanism is enforced for identifying malicious clients based on the model performance in terms of considerable fluctuation in either accuracy or loss minimization. The selector flags suspicious clients and temporarily suspend such clients from participating in the FL training process. We further evaluate the performance of FairEquityFL on a publicly available dataset, FEMNIST. Our simulation results depict that FairEquityFL outperforms baseline models to a considerable extent.

CVSep 19, 2025
Self-Supervised Cross-Modal Learning for Image-to-Point Cloud Registration

Xingmei Wang, Xiaoyu Hu, Chengkai Huang et al.

Bridging 2D and 3D sensor modalities is critical for robust perception in autonomous systems. However, image-to-point cloud (I2P) registration remains challenging due to the semantic-geometric gap between texture-rich but depth-ambiguous images and sparse yet metrically precise point clouds, as well as the tendency of existing methods to converge to local optima. To overcome these limitations, we introduce CrossI2P, a self-supervised framework that unifies cross-modal learning and two-stage registration in a single end-to-end pipeline. First, we learn a geometric-semantic fused embedding space via dual-path contrastive learning, enabling annotation-free, bidirectional alignment of 2D textures and 3D structures. Second, we adopt a coarse-to-fine registration paradigm: a global stage establishes superpoint-superpixel correspondences through joint intra-modal context and cross-modal interaction modeling, followed by a geometry-constrained point-level refinement for precise registration. Third, we employ a dynamic training mechanism with gradient normalization to balance losses for feature alignment, correspondence refinement, and pose estimation. Extensive experiments demonstrate that CrossI2P outperforms state-of-the-art methods by 23.7% on the KITTI Odometry benchmark and by 37.9% on nuScenes, significantly improving both accuracy and robustness.

CLSep 10, 2025
Adversarial Attacks Against Automated Fact-Checking: A Survey

Fanzhen Liu, Alsharif Abuadbba, Kristen Moore et al.

In an era where misinformation spreads freely, fact-checking (FC) plays a crucial role in verifying claims and promoting reliable information. While automated fact-checking (AFC) has advanced significantly, existing systems remain vulnerable to adversarial attacks that manipulate or generate claims, evidence, or claim-evidence pairs. These attacks can distort the truth, mislead decision-makers, and ultimately undermine the reliability of FC models. Despite growing research interest in adversarial attacks against AFC systems, a comprehensive, holistic overview of key challenges remains lacking. These challenges include understanding attack strategies, assessing the resilience of current models, and identifying ways to enhance robustness. This survey provides the first in-depth review of adversarial attacks targeting FC, categorizing existing attack methodologies and evaluating their impact on AFC systems. Additionally, we examine recent advancements in adversary-aware defenses and highlight open research questions that require further exploration. Our findings underscore the urgent need for resilient FC frameworks capable of withstanding adversarial manipulations in pursuit of preserving high verification accuracy.

CLSep 6, 2025
A Survey of the State-of-the-Art in Conversational Question Answering Systems

Manoj Madushanka Perera, Adnan Mahmood, Kasun Eranda Wijethilake et al.

Conversational Question Answering (ConvQA) systems have emerged as a pivotal area within Natural Language Processing (NLP) by driving advancements that enable machines to engage in dynamic and context-aware conversations. These capabilities are increasingly being applied across various domains, i.e., customer support, education, legal, and healthcare where maintaining a coherent and relevant conversation is essential. Building on recent advancements, this survey provides a comprehensive analysis of the state-of-the-art in ConvQA. This survey begins by examining the core components of ConvQA systems, i.e., history selection, question understanding, and answer prediction, highlighting their interplay in ensuring coherence and relevance in multi-turn conversations. It further investigates the use of advanced machine learning techniques, including but not limited to, reinforcement learning, contrastive learning, and transfer learning to improve ConvQA accuracy and efficiency. The pivotal role of large language models, i.e., RoBERTa, GPT-4, Gemini 2.0 Flash, Mistral 7B, and LLaMA 3, is also explored, thereby showcasing their impact through data scalability and architectural advancements. Additionally, this survey presents a comprehensive analysis of key ConvQA datasets and concludes by outlining open research directions. Overall, this work offers a comprehensive overview of the ConvQA landscape and provides valuable insights to guide future advancements in the field.

LGAug 15, 2025
Towards Faithful Class-level Self-explainability in Graph Neural Networks by Subgraph Dependencies

Fanzhen Liu, Xiaoxiao Ma, Jian Yang et al.

Enhancing the interpretability of graph neural networks (GNNs) is crucial to ensure their safe and fair deployment. Recent work has introduced self-explainable GNNs that generate explanations as part of training, improving both faithfulness and efficiency. Some of these models, such as ProtGNN and PGIB, learn class-specific prototypes, offering a potential pathway toward class-level explanations. However, their evaluations focus solely on instance-level explanations, leaving open the question of whether these prototypes meaningfully generalize across instances of the same class. In this paper, we introduce GraphOracle, a novel self-explainable GNN framework designed to generate and evaluate class-level explanations for GNNs. Our model jointly learns a GNN classifier and a set of structured, sparse subgraphs that are discriminative for each class. We propose a novel integrated training that captures graph$\unicode{x2013}$subgraph$\unicode{x2013}$prediction dependencies efficiently and faithfully, validated through a masking-based evaluation strategy. This strategy enables us to retroactively assess whether prior methods like ProtGNN and PGIB deliver effective class-level explanations. Our results show that they do not. In contrast, GraphOracle achieves superior fidelity, explainability, and scalability across a range of graph classification tasks. We further demonstrate that GraphOracle avoids the computational bottlenecks of previous methods$\unicode{x2014}$like Monte Carlo Tree Search$\unicode{x2014}$by using entropy-regularized subgraph selection and lightweight random walk extraction, enabling faster and more scalable training. These findings position GraphOracle as a practical and principled solution for faithful class-level self-explainability in GNNs.

LGJun 17, 2025
Convergence-Privacy-Fairness Trade-Off in Personalized Federated Learning

Xiyu Zhao, Qimei Cui, Weicai Li et al.

Personalized federated learning (PFL), e.g., the renowned Ditto, strikes a balance between personalization and generalization by conducting federated learning (FL) to guide personalized learning (PL). While FL is unaffected by personalized model training, in Ditto, PL depends on the outcome of the FL. However, the clients' concern about their privacy and consequent perturbation of their local models can affect the convergence and (performance) fairness of PL. This paper presents PFL, called DP-Ditto, which is a non-trivial extension of Ditto under the protection of differential privacy (DP), and analyzes the trade-off among its privacy guarantee, model convergence, and performance distribution fairness. We also analyze the convergence upper bound of the personalized models under DP-Ditto and derive the optimal number of global aggregations given a privacy budget. Further, we analyze the performance fairness of the personalized models, and reveal the feasibility of optimizing DP-Ditto jointly for convergence and fairness. Experiments validate our analysis and demonstrate that DP-Ditto can surpass the DP-perturbed versions of the state-of-the-art PFL models, such as FedAMP, pFedMe, APPLE, and FedALA, by over 32.71% in fairness and 9.66% in accuracy.

CVApr 18, 2025
ProgRoCC: A Progressive Approach to Rough Crowd Counting

Shengqin Jiang, Linfei Li, Haokui Zhang et al.

As the number of individuals in a crowd grows, enumeration-based techniques become increasingly infeasible and their estimates increasingly unreliable. We propose instead an estimation-based version of the problem: we label Rough Crowd Counting that delivers better accuracy on the basis of training data that is easier to acquire. Rough crowd counting requires only rough annotations of the number of targets in an image, instead of the more traditional, and far more expensive, per-target annotations. We propose an approach to the rough crowd counting problem based on CLIP, termed ProgRoCC. Specifically, we introduce a progressive estimation learning strategy that determines the object count through a coarse-to-fine approach. This approach delivers answers quickly, outperforms the state-of-the-art in semi- and weakly-supervised crowd counting. In addition, we design a vision-language matching adapter that optimizes key-value pairs by mining effective matches of two modalities to refine the visual features, thereby improving the final performance. Extensive experimental results on three widely adopted crowd counting datasets demonstrate the effectiveness of our method.

ROMay 11, 2024
Multi-agent Traffic Prediction via Denoised Endpoint Distribution

Yao Liu, Ruoyu Wang, Yuanjiang Cao et al.

The exploration of high-speed movement by robots or road traffic agents is crucial for autonomous driving and navigation. Trajectory prediction at high speeds requires considering historical features and interactions with surrounding entities, a complexity not as pronounced in lower-speed environments. Prior methods have assessed the spatio-temporal dynamics of agents but often neglected intrinsic intent and uncertainty, thereby limiting their effectiveness. We present the Denoised Endpoint Distribution model for trajectory prediction, which distinctively models agents' spatio-temporal features alongside their intrinsic intentions and uncertainties. By employing Diffusion and Transformer models to focus on agent endpoints rather than entire trajectories, our approach significantly reduces model complexity and enhances performance through endpoint information. Our experiments on open datasets, coupled with comparison and ablation studies, demonstrate our model's efficacy and the importance of its components. This approach advances trajectory prediction in high-speed scenarios and lays groundwork for future developments.

AIMay 29, 2023
ProcessGPT: Transforming Business Process Management with Generative Artificial Intelligence

Amin Beheshti, Jian Yang, Quan Z. Sheng et al.

Generative Pre-trained Transformer (GPT) is a state-of-the-art machine learning model capable of generating human-like text through natural language processing (NLP). GPT is trained on massive amounts of text data and uses deep learning techniques to learn patterns and relationships within the data, enabling it to generate coherent and contextually appropriate text. This position paper proposes using GPT technology to generate new process models when/if needed. We introduce ProcessGPT as a new technology that has the potential to enhance decision-making in data-centric and knowledge-intensive processes. ProcessGPT can be designed by training a generative pre-trained transformer model on a large dataset of business process data. This model can then be fine-tuned on specific process domains and trained to generate process flows and make decisions based on context and user input. The model can be integrated with NLP and machine learning techniques to provide insights and recommendations for process improvement. Furthermore, the model can automate repetitive tasks and improve process efficiency while enabling knowledge workers to communicate analysis findings, supporting evidence, and make decisions. ProcessGPT can revolutionize business process management (BPM) by offering a powerful tool for process augmentation, automation and improvement. Finally, we demonstrate how ProcessGPT can be a powerful tool for augmenting data engineers in maintaining data ecosystem processes within large bank organizations. Our scenario highlights the potential of this approach to improve efficiency, reduce costs, and enhance the quality of business operations through the automation of data-centric and knowledge-intensive processes. These results underscore the promise of ProcessGPT as a transformative technology for organizations looking to improve their process workflows.

LGMay 9, 2023
BARA: Efficient Incentive Mechanism with Online Reward Budget Allocation in Cross-Silo Federated Learning

Yunchao Yang, Yipeng Zhou, Miao Hu et al.

Federated learning (FL) is a prospective distributed machine learning framework that can preserve data privacy. In particular, cross-silo FL can complete model training by making isolated data islands of different organizations collaborate with a parameter server (PS) via exchanging model parameters for multiple communication rounds. In cross-silo FL, an incentive mechanism is indispensable for motivating data owners to contribute their models to FL training. However, how to allocate the reward budget among different rounds is an essential but complicated problem largely overlooked by existing works. The challenge of this problem lies in the opaque feedback between reward budget allocation and model utility improvement of FL, making the optimal reward budget allocation complicated. To address this problem, we design an online reward budget allocation algorithm using Bayesian optimization named BARA (\underline{B}udget \underline{A}llocation for \underline{R}everse \underline{A}uction). Specifically, BARA can model the complicated relationship between reward budget allocation and final model accuracy in FL based on historical training records so that the reward budget allocated to each communication round is dynamically optimized so as to maximize the final model utility. We further incorporate the BARA algorithm into reverse auction-based incentive mechanisms to illustrate its effectiveness. Extensive experiments are conducted on real datasets to demonstrate that BARA significantly outperforms competitive baselines by improving model utility with the same amount of reward budget.