Wei Emma Zhang

LG
h-index23
49papers
1,642citations
Novelty36%
AI Score54

49 Papers

18.9LGJun 1
Before Fusion, Ask What to Keep: Contextual Calibration of Multimodal Signals

Jiyuan Liu, Liangwei Nathan Zheng, Wei Emma Zhang et al.

Multimodal systems often benefit from combining information across language, sound, and visual streams, but this benefit is not guaranteed. A modality that is useful for one input may become distracting for another, and local feature responses within the same modality can disagree with evidence from other sources. This work investigates how to adjust multimodal representations before they are merged by a downstream predictor. We develop a compact calibration module that compares each modality with the others at the summary level, extracts cues of cross-source support and conflict, and converts these cues into instance-wise and dimension-wise modulation signals. The calibration is applied to the original modality features rather than to already fused representations, enabling the model to suppress misleading components, preserve weak but useful evidence, and emphasize responses that are better supported by the current multimodal context. The module is designed as a plug-in component and can be attached to different fusion backbones without changing their prediction heads. Across five benchmarks covering sentiment understanding, action recognition, audio-visual event detection, and audio-visual emotion classification, the proposed pre-combination calibration strategy improves performance under both sequence-based and convolutional fusion settings. Additional analyses under modality removal, synthetic corruption, training dynamics, and feature-level visualization show that calibrating signals before fusion can reduce interference from unreliable modalities and produce more stable multimodal optimization.

DLSep 18, 2023
When Large Language Models Meet Citation: A Survey

Yang Zhang, Yufei Wang, Kai Wang et al.

Citations in scholarly work serve the essential purpose of acknowledging and crediting the original sources of knowledge that have been incorporated or referenced. Depending on their surrounding textual context, these citations are used for different motivations and purposes. Large Language Models (LLMs) could be helpful in capturing these fine-grained citation information via the corresponding textual context, thereby enabling a better understanding towards the literature. Furthermore, these citations also establish connections among scientific papers, providing high-quality inter-document relationships and human-constructed knowledge. Such information could be incorporated into LLMs pre-training and improve the text representation in LLMs. Therefore, in this paper, we offer a preliminary review of the mutually beneficial relationship between LLMs and citation analysis. Specifically, we review the application of LLMs for in-text citation analysis tasks, including citation classification, citation-based summarization, and citation recommendation. We then summarize the research pertinent to leveraging citation linkage knowledge to improve text representations of LLMs via citation prediction, network structure information, and inter-document relationship. We finally provide an overview of these contemporary methods and put forth potential promising avenues in combining LLMs and citation analysis for further investigation.

CLApr 29, 2022
Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations

Na Liu, Mark Dras, Wei Emma Zhang

Although deep neural networks have achieved state-of-the-art performance in various machine learning tasks, adversarial examples, constructed by adding small non-random perturbations to correctly classified inputs, successfully fool highly expressive deep classifiers into incorrect predictions. Approaches to adversarial attacks in natural language tasks have boomed in the last five years using character-level, word-level, phrase-level, or sentence-level textual perturbations. While there is some work in NLP on defending against such attacks through proactive methods, like adversarial training, there is to our knowledge no effective general reactive approaches to defence via detection of textual adversarial examples such as is found in the image processing literature. In this paper, we propose two new reactive methods for NLP to fill this gap, which unlike the few limited application baselines from NLP are based entirely on distribution characteristics of learned representations: we adapt one from the image processing literature (Local Intrinsic Dimensionality (LID)), and propose a novel one (MultiDistance Representation Ensemble Method (MDRE)). Adapted LID and MDRE obtain state-of-the-art results on character-level, word-level, and phrase-level attacks on the IMDB dataset as well as on the later two with respect to the MultiNLI dataset. For future research, we publish our code.

53.2LGMay 22
Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

Liangwei Nathan Zheng, Wei Emma Zhang, Olaf Maennel et al.

Mixture-of-Experts (MoE) presents a naturally compatible and scalable framework for multimodal learning, demonstrating strong adaptability across diverse modalities and tasks. Despite its growing success, a comprehensive and systematic review on the MoE metho addressing multimodal challenges remains lacking. Existing surveys tend to evaluate either multimodal learning or MoE independently from method taxonomy, overlooking the unique interplay between them. This survey fills that gap by answering a central question: \textit{How does MoE effectively resolve multimodal challenges?} We approach this from three key perspectives: (1) \textbf{MoE as an Efficient Multimodal Engine:} enabling scalable multimodal modeling by decoupling computational cost from parameter growth and mitigating modality redundancy through selective expert activation; (2) \textbf{MoE as a Multimodal Representation Learner:} integrating complementary multi-opinion expert knowledge to enrich alignment and interaction representations; and (3) \textbf{MoE as a Multimodal Adapter:} providing a modular and flexible mechanism to model imperfect data scenarios such as modality imbalance and missing modality. Through our extensive literature review, we identify critical research gaps, including interpretable routing, expert communication, modality integration, and lifelong multimodal learning. We position this survey as a foundation for future research toward interpretable and sustainable multimodal Mixture-of-Experts system.

CLAug 4, 2023
Learning to Select the Relevant History Turns in Conversational Question Answering

Munazza Zaib, Wei Emma Zhang, Quan Z. Sheng et al.

The increasing demand for the web-based digital assistants has given a rapid rise in the interest of the Information Retrieval (IR) community towards the field of conversational question answering (ConvQA). However, one of the critical aspects of ConvQA is the effective selection of conversational history turns to answer the question at hand. The dependency between relevant history selection and correct answer prediction is an intriguing but under-explored area. The selected relevant context can better guide the system so as to where exactly in the passage to look for an answer. Irrelevant context, on the other hand, brings noise to the system, thereby resulting in a decline in the model's performance. In this paper, we propose a framework, DHS-ConvQA (Dynamic History Selection in Conversational Question Answering), that first generates the context and question entities for all the history turns, which are then pruned on the basis of similarity they share in common with the question at hand. We also propose an attention-based mechanism to re-rank the pruned terms based on their calculated weights of how useful they are in answering the question. In the end, we further aid the model by highlighting the terms in the re-ranked conversational history using a binary classification task and keeping the useful terms (predicted as 1) and ignoring the irrelevant terms (predicted as 0). We demonstrate the efficacy of our proposed framework with extensive experimental results on CANARD and QuAC -- the two popularly utilized datasets in ConvQA. We demonstrate that selecting relevant turns works better than rewriting the original question. We also investigate how adding the irrelevant history turns negatively impacts the model's performance and discuss the research challenges that demand more attention from the IR community.

CLApr 14, 2023
Keeping the Questions Conversational: Using Structured Representations to Resolve Dependency in Conversational Question Answering

Munazza Zaib, Quan Z. Sheng, Wei Emma Zhang et al.

Having an intelligent dialogue agent that can engage in conversational question answering (ConvQA) is now no longer limited to Sci-Fi movies only and has, in fact, turned into a reality. These intelligent agents are required to understand and correctly interpret the sequential turns provided as the context of the given question. However, these sequential questions are sometimes left implicit and thus require the resolution of some natural language phenomena such as anaphora and ellipsis. The task of question rewriting has the potential to address the challenges of resolving dependencies amongst the contextual turns by transforming them into intent-explicit questions. Nonetheless, the solution of rewriting the implicit questions comes with some potential challenges such as resulting in verbose questions and taking conversational aspect out of the scenario by generating self-contained questions. In this paper, we propose a novel framework, CONVSR (CONVQA using Structured Representations) for capturing and generating intermediate representations as conversational cues to enhance the capability of the QA model to better interpret the incomplete questions. We also deliberate how the strengths of this task could be leveraged in a bid to design more engaging and eloquent conversational agents. We test our model on the QuAC and CANARD datasets and illustrate by experimental results that our proposed framework achieves a better F1 score than the standard question rewriting model.

CLJan 27, 2023
The Exploration of Knowledge-Preserving Prompts for Document Summarisation

Chen Chen, Wei Emma Zhang, Alireza Seyed Shakeri et al.

Despite the great development of document summarisation techniques nowadays, factual inconsistencies between the generated summaries and the original texts still occur from time to time. This study explores the possibility of adopting prompts to incorporate factual knowledge into generated summaries. We specifically study prefix-tuning that uses a set of trainable continuous prefix prompts together with discrete natural language prompts to aid summary generation. Experimental results demonstrate that the trainable prefixes can help the summarisation model extract information from discrete prompts precisely, thus generating knowledge-preserving summaries that are factually consistent with the discrete prompts. The ROUGE improvements of the generated summaries indicate that explicitly adding factual knowledge into the summarisation process could boost the overall performance, showing great potential for applying it to other natural language processing tasks.

CLApr 24, 2022
Knowledge-aware Document Summarization: A Survey of Knowledge, Embedding Methods and Architectures

Yutong Qu, Wei Emma Zhang, Jian Yang et al.

Knowledge-aware methods have boosted a range of natural language processing applications over the last decades. With the gathered momentum, knowledge recently has been pumped into enormous attention in document summarization, one of natural language processing applications. Previous works reported that knowledge-embedded document summarizers excel at generating superior digests, especially in terms of informativeness, coherence, and fact consistency. This paper pursues to present the first systematic survey for the state-of-the-art methodologies that embed knowledge into document summarizers. Particularly, we propose novel taxonomies to recapitulate knowledge and knowledge embeddings under the document summarization view. We further explore how embeddings are generated in embedding learning architectures of document summarization models, especially of deep learning models. At last, we discuss the challenges of this topic and future directions.

CLSep 13, 2022
Document-aware Positional Encoding and Linguistic-guided Encoding for Abstractive Multi-document Summarization

Congbo Ma, Wei Emma Zhang, Pitawelayalage Dasun Dileepa Pitawela et al.

One key challenge in multi-document summarization is to capture the relations among input documents that distinguish between single document summarization (SDS) and multi-document summarization (MDS). Few existing MDS works address this issue. One effective way is to encode document positional information to assist models in capturing cross-document relations. However, existing MDS models, such as Transformer-based models, only consider token-level positional information. Moreover, these models fail to capture sentences' linguistic structure, which inevitably causes confusions in the generated summaries. Therefore, in this paper, we propose document-aware positional encoding and linguistic-guided encoding that can be fused with Transformer architecture for MDS. For document-aware positional encoding, we introduce a general protocol to guide the selection of document encoding functions. For linguistic-guided encoding, we propose to embed syntactic dependency relations into the dependency relation mask with a simple but effective non-linear encoding learner for feature learning. Extensive experiments show the proposed model can generate summaries with high quality.

LGSep 6, 2023
SWAP: Exploiting Second-Ranked Logits for Adversarial Attacks on Time Series

Chang George Dong, Liangwei Nathan Zheng, Weitong Chen et al.

Time series classification (TSC) has emerged as a critical task in various domains, and deep neural models have shown superior performance in TSC tasks. However, these models are vulnerable to adversarial attacks, where subtle perturbations can significantly impact the prediction results. Existing adversarial methods often suffer from over-parameterization or random logit perturbation, hindering their effectiveness. Additionally, increasing the attack success rate (ASR) typically involves generating more noise, making the attack more easily detectable. To address these limitations, we propose SWAP, a novel attacking method for TSC models. SWAP focuses on enhancing the confidence of the second-ranked logits while minimizing the manipulation of other logits. This is achieved by minimizing the Kullback-Leibler divergence between the target logit distribution and the predictive logit distribution. Experimental results demonstrate that SWAP achieves state-of-the-art performance, with an ASR exceeding 50% and an 18% increase compared to existing methods.

67.2CLApr 19
Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation

Elaf Alhazmi, Quan Z. Sheng, Wei Emma Zhang

Distractor generation (DG) remains a labor-intensive task that still significantly depends on domain experts. The task focuses on generating plausible yet incorrect options, known as distractors, for multiple-choice questions. A reliable distractor must be contextually relevant to the question and able to mislead examinees through implicit reasoning when identifying the correct answer. While a recent method integrates fine-tuning pre-trained encoder-decoder models with contrastive learning to generate semantically relevant distractors for a given question-answer, it often fails to capture the underlying reasoning process that experts utilize when selecting distractors in benchmarks. In this paper, we explore large language models (LLMs) reasoning for DG through in-context learning with unsupervised semantic retrieval for selecting few-shot examples. We design a rationale-augmented DG framework that jointly generates distractors and their rationales for a given question-answer. Extensive experiments on six benchmarks, with varying average distractor lengths and domains, demonstrate that prompting LLMs with few-shot examples substantially improves the performance compared to recent DG models. It outperforms recent approaches and achieves state-of-the-art results in generating reasoned distractors that align with human-labeled benchmarks.

LGOct 16, 2024Code
Irregularity-Informed Time Series Analysis: Adaptive Modelling of Spatial and Temporal Dynamics

Liangwei Nathan Zheng, Zhengyang Li, Chang George Dong et al.

Irregular Time Series Data (IRTS) has shown increasing prevalence in real-world applications. We observed that IRTS can be divided into two specialized types: Natural Irregular Time Series (NIRTS) and Accidental Irregular Time Series (AIRTS). Various existing methods either ignore the impacts of irregular patterns or statically learn the irregular dynamics of NIRTS and AIRTS data and suffer from limited data availability due to the sparsity of IRTS. We proposed a novel transformer-based framework for general irregular time series data that treats IRTS from four views: Locality, Time, Spatio and Irregularity to motivate the data usage to the highest potential. Moreover, we design a sophisticated irregularity-gate mechanism to adaptively select task-relevant information from irregularity, which improves the generalization ability to various IRTS data. We implement extensive experiments to demonstrate the resistance of our work to three highly missing ratio datasets (88.4\%, 94.9\%, 60\% missing value) and investigate the significance of the irregularity information for both NIRTS and AIRTS by additional ablation study. We release our implementation in https://github.com/IcurasLW/MTSFormer-Irregular_Time_Series.git

CLJul 16, 2024
Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation

Congbo Ma, Wei Emma Zhang, Dileepa Pitawela et al.

The utilization of Transformer-based models prospers the growth of multi-document summarization (MDS). Given the huge impact and widespread adoption of Transformer-based models in various natural language processing tasks, investigating their performance and behaviors in the context of MDS becomes crucial for advancing the field and enhancing the quality of summary. To thoroughly examine the behaviours of Transformer-based MDS models, this paper presents five empirical studies on (1) measuring the impact of document boundary separators quantitatively; (2) exploring the effectiveness of different mainstream Transformer structures; (3) examining the sensitivity of the encoder and decoder; (4) discussing different training strategies; and (5) discovering the repetition in a summary generation. The experimental results on prevalent MDS datasets and eleven evaluation metrics show the influence of document boundary separators, the granularity of different level features and different model training strategies. The results also reveal that the decoder exhibits greater sensitivity to noises compared to the encoder. This underscores the important role played by the decoder, suggesting a potential direction for future research in MDS. Furthermore, the experimental results indicate that the repetition problem in the generated summaries has correlations with the high uncertainty scores.

LGSep 4, 2024
Boosting Certified Robustness for Time Series Classification with Efficient Self-Ensemble

Chang Dong, Zhengyang Li, Liangwei Zheng et al.

Recently, the issue of adversarial robustness in the time series domain has garnered significant attention. However, the available defense mechanisms remain limited, with adversarial training being the predominant approach, though it does not provide theoretical guarantees. Randomized Smoothing has emerged as a standout method due to its ability to certify a provable lower bound on robustness radius under $\ell_p$-ball attacks. Recognizing its success, research in the time series domain has started focusing on these aspects. However, existing research predominantly focuses on time series forecasting, or under the non-$\ell_p$ robustness in statistic feature augmentation for time series classification~(TSC). Our review found that Randomized Smoothing performs modestly in TSC, struggling to provide effective assurances on datasets with poor robustness. Therefore, we propose a self-ensemble method to enhance the lower bound of the probability confidence of predicted labels by reducing the variance of classification margins, thereby certifying a larger radius. This approach also addresses the computational overhead issue of Deep Ensemble~(DE) while remaining competitive and, in some cases, outperforming it in terms of robustness. Both theoretical analysis and experimental results validate the effectiveness of our method, demonstrating superior performance in robustness testing compared to baseline approaches.

LGOct 16, 2024Code
Devil in the Tail: A Multi-Modal Framework for Drug-Drug Interaction Prediction in Long Tail Distinction

Liangwei Nathan Zheng, Chang George Dong, Wei Emma Zhang et al.

Drug-drug interaction (DDI) identification is a crucial aspect of pharmacology research. There are many DDI types (hundreds), and they are not evenly distributed with equal chance to occur. Some of the rarely occurred DDI types are often high risk and could be life-critical if overlooked, exemplifying the long-tailed distribution problem. Existing models falter against this distribution challenge and overlook the multi-faceted nature of drugs in DDI prediction. In this paper, a novel multi-modal deep learning-based framework, namely TFDM, is introduced to leverage multiple properties of a drug to achieve DDI classification. The proposed framework fuses multimodal features of drugs, including graph-based, molecular structure, Target and Enzyme, for DDI identification. To tackle the challenge posed by the distribution skewness across categories, a novel loss function called Tailed Focal Loss is introduced, aimed at further enhancing the model performance and address gradient vanishing problem of focal loss in extremely long-tailed dataset. Intensive experiments over 4 challenging long-tailed dataset demonstrate that the TFMD outperforms the most recent SOTA methods in long-tailed DDI classification tasks. The source code is released to reproduce our experiment results: https://github.com/IcurasLW/TFMD_Longtailed_DDI.git

IRMay 12, 2024Code
Disentangling Specificity for Abstractive Multi-document Summarization

Congbo Ma, Wei Emma Zhang, Hu Wang et al.

Multi-document summarization (MDS) generates a summary from a document set. Each document in a set describes topic-relevant concepts, while per document also has its unique contents. However, the document specificity receives little attention from existing MDS approaches. Neglecting specific information for each document limits the comprehensiveness of the generated summaries. To solve this problem, in this paper, we propose to disentangle the specific content from documents in one document set. The document-specific representations, which are encouraged to be distant from each other via a proposed orthogonal constraint, are learned by the specific representation learner. We provide extensive analysis and have interesting findings that specific information and document set representations contribute distinctive strengths and their combination yields a more comprehensive solution for the MDS. Also, we find that the common (i.e. shared) information could not contribute much to the overall performance under the MDS settings. Implemetation codes are available at https://github.com/congboma/DisentangleSum.

11.1CVMay 11
Probing Routing-Conditional Calibration in Attention-Residual Transformers

Wenhao Liang, Lin Yue, Wei Emma Zhang et al.

Post-hoc calibration is usually evaluated as a function of logits or softmax confidence alone, even as routing-augmented architectures increasingly accompany predictions with sample-specific internal routing traces and pair them with claims of calibration-relevant uncertainty. We ask a basic question: do these traces provide stable routing-specific evidence for post-hoc calibration beyond confidence? We study this in Attention-Residual transformers (Kimi Team, 2026) through a matched-confidence diagnostic suite that stratifies examples by routing-derived state, compares subgroup gaps against within-bin routing-permutation nulls, and evaluates matched post-hoc probes differing only in their auxiliary feature. Across our completed AR runs, scalar routing summaries do not provide stable evidence of routing-conditional miscalibration: weighted gaps remain small or seed-sensitive, and only $1$ of $30$ within-bin permutation tests rejects the conditional-null at $α=0.05$ (only on one seed; not stable across seeds in that cell). AR-CondCal, a minimal $2$-D Nadaraya--Watson probe on confidence and routing-depth variance, lies within the seed-variance band of matched confidence-only and predictive-entropy controls and does not reliably improve worst-routing-tertile ECE; bandwidth-sensitivity checks (Scott multiples, CV-NLL, global-ECE oracle) do not change this. A full-vector MLP over $(c, H_1, \ldots, H_L)$ can appear to improve over a linear confidence baseline, but the apparent gain disappears once a capacity-matched confidence-only MLP is included as a control, and shuffled routing profiles achieve comparable performance. Apparent routing-aware calibration gains in this AR setting should not be read as internal-state calibration until matched-confidence, bandwidth, capacity, and permutation controls rule out common confounds.

CVAug 12, 2025Code
Calibration Attention: Instance-wise Temperature Scaling for Vision Transformers

Wenhao Liang, Wei Emma Zhang, Lin Yue et al.

Probability calibration is critical when Vision Transformers are deployed in risk-sensitive applications. The standard fix, post-hoc temperature scaling, uses a single global scalar and requires a held-out validation set. We introduce Calibration Attention (CalAttn), a drop-in module that learns an adaptive, per-instance temperature directly from the ViT's CLS token. Across CIFAR-10/100, MNIST, Tiny-ImageNet, and ImageNet-1K, CalAttn reduces calibration error by up to 4x on ViT-224, DeiT, and Swin, while adding under 0.1 percent additional parameters. The learned temperatures cluster tightly around 1.0, in contrast to the large global values used by standard temperature scaling. CalAttn is simple, efficient, and architecture-agnostic, and yields more trustworthy probabilities without sacrificing accuracy. Code: [https://github.com/EagleAdelaide/CalibrationAttention-CalAttn-](https://github.com/EagleAdelaide/CalibrationAttention-CalAttn-)

LGMay 11, 2025Code
MMiC: Mitigating Modality Incompleteness in Clustered Federated Learning

Lishan Yang, Wei Emma Zhang, Quan Z. Sheng et al.

In the era of big data, data mining has become indispensable for uncovering hidden patterns and insights from vast and complex datasets. The integration of multimodal data sources further enhances its potential. Multimodal Federated Learning (MFL) is a distributed approach that enhances the efficiency and quality of multimodal learning, ensuring collaborative work and privacy protection. However, missing modalities pose a significant challenge in MFL, often due to data quality issues or privacy policies across the clients. In this work, we present MMiC, a framework for Mitigating Modality incompleteness in MFL within the Clusters. MMiC replaces partial parameters within client models inside clusters to mitigate the impact of missing modalities. Furthermore, it leverages the Banzhaf Power Index to optimize client selection under these conditions. Finally, MMiC employs an innovative approach to dynamically control global aggregation by utilizing Markovitz Portfolio Optimization. Extensive experiments demonstrate that MMiC consistently outperforms existing federated learning architectures in both global and personalized performance on multimodal datasets with missing modalities, confirming the effectiveness of our proposed solution. Our code is available at https://github.com/gotobcn8/MMiC.

CVJul 19, 2020Code
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

Ruixue Tang, Chao Ma, Wei Emma Zhang et al.

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the major tricks for DNN, has been widely used in many computer vision tasks. However, there are few works studying the data augmentation problem for VQA and none of the existing image based augmentation schemes (such as rotation and flipping) can be directly applied to VQA due to its semantic structure -- an $\langle image, question, answer\rangle$ triplet needs to be maintained correctly. For example, a direction related Question-Answer (QA) pair may not be true if the associated image is rotated or flipped. In this paper, instead of directly manipulating images and questions, we use generated adversarial examples for both images and questions as the augmented data. The augmented examples do not change the visual properties presented in the image as well as the \textbf{semantic} meaning of the question, the correctness of the $\langle image, question, answer\rangle$ is thus still maintained. We then use adversarial learning to train a classic VQA model (BUTD) with our augmented data. We find that we not only improve the overall performance on VQAv2, but also can withstand adversarial attack effectively, compared to the baseline model. The source code is available at https://github.com/zaynmi/seada-vqa.

CLFeb 2, 2024
Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation

Elaf Alhazmi, Quan Z. Sheng, Wei Emma Zhang et al.

The distractor generation task focuses on generating incorrect but plausible options for objective questions such as fill-in-the-blank and multiple-choice questions. This task is widely utilized in educational settings across various domains and subjects. The effectiveness of these questions in assessments relies on the quality of the distractors, as they challenge examinees to select the correct answer from a set of misleading options. The evolution of artificial intelligence (AI) has transitioned the task from traditional methods to the use of neural networks and pre-trained language models. This shift has established new benchmarks and expanded the use of advanced deep learning methods in generating distractors. This survey explores distractor generation tasks, datasets, methods, and current evaluation metrics for English objective questions, covering both text-based and multi-modal domains. It also evaluates existing AI models and benchmarks and discusses potential future research directions.

CVMay 21, 2024
A Survey of Deep Learning-based Radiology Report Generation Using Multimodal Data

Xinyi Wang, Grazziela Figueredo, Ruizhe Li et al.

Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works have emerged to address this issue using deep-learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep-learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion and interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, we summarize the latest developments in large model-based methods and model explainability, along with public datasets, evaluation methods, current challenges, and future directions in this field. We have also conducted a quantitative comparison between different methods in the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and to assist them in developing new algorithms to advance the field.

LGOct 16, 2024
Understanding Why Large Language Models Can Be Ineffective in Time Series Analysis: The Impact of Modality Alignment

Liangwei Nathan Zheng, Chang George Dong, Wei Emma Zhang et al.

Large Language Models (LLMs) have demonstrated impressive performance in time series analysis and seems to understand the time temporal relationship well than traditional transformer-based approaches. However, since LLMs are not designed for time series tasks, simpler models like linear regressions can often achieve comparable performance with far less complexity. In this study, we perform extensive experiments to assess the effectiveness of applying LLMs to key time series tasks, including forecasting, classification, imputation, and anomaly detection. We compare the performance of LLMs against simpler baseline models, such as single layer linear models and randomly initialized LLMs. Our results reveal that LLMs offer minimal advantages for these core time series tasks and may even distort the temporal structure of the data. In contrast, simpler models consistently outperform LLMs while requiring far fewer parameters. Furthermore, we analyze existing reprogramming techniques and show, through data manifold analysis, that these methods fail to effectively align time series data with language and display "pseudo-alignment" behavior in embedding space. Our findings suggest that the performance of LLM based methods in time series tasks arises from the intrinsic characteristics and structure of time series data, rather than any meaningful alignment with the language model architecture.

49.3CVMar 13
Test-Time Attention Purification for Backdoored Large Vision Language Models

Zhifang Zhang, Bojun Yang, Shuo He et al.

Despite the strong multimodal performance, large vision-language models (LVLMs) are vulnerable during fine-tuning to backdoor attacks, where adversaries insert trigger-embedded samples into the training data to implant behaviors that can be maliciously activated at test time. Existing defenses typically rely on retraining backdoored parameters (e.g., adapters or LoRA modules) with clean data, which is computationally expensive and often degrades model performance. In this work, we provide a new mechanistic understanding of backdoor behaviors in LVLMs: the trigger does not influence prediction through low-level visual patterns, but through abnormal cross-modal attention redistribution, where trigger-bearing visual tokens steal attention away from the textual context - a phenomenon we term attention stealing. Motivated by this, we propose CleanSight, a training-free, plug-and-play defense that operates purely at test time. CleanSight (i) detects poisoned inputs based on the relative visual-text attention ratio in selected cross-modal fusion layers, and (ii) purifies the input by selectively pruning the suspicious high-attention visual tokens to neutralize the backdoor activation. Extensive experiments show that CleanSight significantly outperforms existing pixel-based purification defenses across diverse datasets and backdoor attack types, while preserving the model's utility on both clean and poisoned samples.

LGJan 16, 2025
Free-Knots Kolmogorov-Arnold Network: On the Analysis of Spline Knots and Advancing Stability

Liangwewi Nathan Zheng, Wei Emma Zhang, Lin Yue et al.

Kolmogorov-Arnold Neural Networks (KANs) have gained significant attention in the machine learning community. However, their implementation often suffers from poor training stability and heavy trainable parameter. Furthermore, there is limited understanding of the behavior of the learned activation functions derived from B-splines. In this work, we analyze the behavior of KANs through the lens of spline knots and derive the lower and upper bound for the number of knots in B-spline-based KANs. To address existing limitations, we propose a novel Free Knots KAN that enhances the performance of the original KAN while reducing the number of trainable parameters to match the trainable parameter scale of standard Multi-Layer Perceptrons (MLPs). Additionally, we introduce new a training strategy to ensure $C^2$ continuity of the learnable spline, resulting in smoother activation compared to the original KAN and improve the training stability by range expansion. The proposed method is comprehensively evaluated on 8 datasets spanning various domains, including image, text, time series, multimodal, and function approximation tasks. The promising results demonstrates the feasibility of KAN-based network and the effectiveness of proposed method.

DCApr 27, 2025
Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers

Shuang Wang, He Zhang, Tianxing Wu et al.

Worldwide, Geo-distributed Data Centers (GDCs) provide computing and storage services for massive workflow applications, resulting in high electricity costs that vary depending on geographical locations and time. How to reduce electricity costs while satisfying the deadline constraints of workflow applications is important in GDCs, which is determined by the execution time of servers, power, and electricity price. Determining the completion time of workflows with different server frequencies can be challenging, especially in scenarios with heterogeneous computing resources in GDCs. Moreover, the electricity price is also different in geographical locations and may change dynamically. To address these challenges, we develop a geo-distributed system architecture and propose an Electricity Cost aware Multiple Workflows Scheduling algorithm (ECMWS) for servers of GDCs with fixed frequency and power. ECMWS comprises four stages, namely workflow sequencing, deadline partitioning, task sequencing, and resource allocation where two graph embedding models and a policy network are constructed to solve the Markov Decision Process (MDP). After statistically calibrating parameters and algorithm components over a comprehensive set of workflow instances, the proposed algorithms are compared with the state-of-the-art methods over two types of workflow instances. The experimental results demonstrate that our proposed algorithm significantly outperforms other algorithms, achieving an improvement of over 15\% while maintaining an acceptable computational time. The source codes are available at https://gitee.com/public-artifacts/ecmws-experiments.

CRFeb 2, 2025
TrojanTime: Backdoor Attacks on Time Series Classification

Chang Dong, Zechao Sun, Guangdong Bai et al.

Time Series Classification (TSC) is highly vulnerable to backdoor attacks, posing significant security threats. Existing methods primarily focus on data poisoning during the training phase, designing sophisticated triggers to improve stealthiness and attack success rate (ASR). However, in practical scenarios, attackers often face restrictions in accessing training data. Moreover, it is a challenge for the model to maintain generalization ability on clean test data while remaining vulnerable to poisoned inputs when data is inaccessible. To address these challenges, we propose TrojanTime, a novel two-step training algorithm. In the first stage, we generate a pseudo-dataset using an external arbitrary dataset through target adversarial attacks. The clean model is then continually trained on this pseudo-dataset and its poisoned version. To ensure generalization ability, the second stage employs a carefully designed training strategy, combining logits alignment and batch norm freezing. We evaluate TrojanTime using five types of triggers across four TSC architectures in UCR benchmark datasets from diverse domains. The results demonstrate the effectiveness of TrojanTime in executing backdoor attacks while maintaining clean accuracy. Finally, to mitigate this threat, we propose a defensive unlearning strategy that effectively reduces the ASR while preserving clean accuracy.

LGOct 14, 2025
Lifting Manifolds to Mitigate Pseudo-Alignment in LLM4TS

Liangwei Nathan Zheng, Wenhao Liang, Wei Emma Zhang et al.

Pseudo-Alignment is a pervasive challenge in many large language models for time series (LLM4TS) models, often causing them to underperform compared to linear models or randomly initialised backbones. However, there is limited discussion in the community for the reasons that pseudo-alignment occurs. In this work, we conduct a thorough investigation into the root causes of pseudo-alignment in LLM4TS and build a connection of pseudo-alignment to the cone effect in LLM. We demonstrate that pseudo-alignment arises from the interplay of cone effect within pretrained LLM components and the intrinsically low-dimensional manifold of time-series data. In addition, we also introduce \textit{\textbf{TimeSUP}}, a novel technique designed to mitigate this issue and improve forecast performance in existing LLM4TS approaches. TimeSUP addresses this by increasing the time series manifold to more closely match the intrinsic dimension of language embeddings, allowing the model to distinguish temporal signals clearly while still capturing shared structures across modalities. As a result, representations for time and language tokens remain distinct yet exhibit high cosine similarity, signifying that the model preserves each modality unique features while learning their commonalities in a unified embedding space. Empirically, TimeSUP consistently outperforms state-of-the-art LLM4TS methods and other lightweight baselines on long-term forecasting performance. Furthermore, it can be seamlessly integrated into four existing LLM4TS pipelines and delivers significant improvements in forecasting performance.

LGSep 1, 2025
FediLoRA: Heterogeneous LoRA for Federated Multimodal Fine-tuning under Missing Modalities

Lishan Yang, Wei Emma Zhang, Nam Kha Nguygen et al.

Foundation models have demonstrated remarkable performance across a wide range of tasks, yet their large parameter sizes pose challenges for practical deployment, especially in decentralized environments. Parameter-efficient fine-tuning (PEFT), such as Low-Rank Adaptation (LoRA), reduces local computing and memory overhead, making it attractive for federated learning. However, existing federated LoRA methods typically assume uniform rank configurations and unimodal inputs, overlooking two key real-world challenges: (1) heterogeneous client resources have different LoRA ranks, and (2) multimodal data settings with potentially missing modalities. In this work, we propose FediLoRA, a simple yet effective framework for federated multimodal fine-tuning under heterogeneous LoRA ranks and missing modalities. FediLoRA introduces a dimension-wise aggregation strategy that reweights LoRA updates without information dilution during aggregation. It also includes a lightweight layer-wise model editing method that selectively incorporates global parameters to repair local components which improves both client and global model performances. Experimental results on three multimodal benchmark datasets demonstrate that FediLoRA achieves superior performance over competitive baselines in both global and personalized settings, particularly in the presence of modality incompleteness.

LGJul 22, 2025
FedDPG: An Adaptive Yet Efficient Prompt-tuning Approach in Federated Learning Settings

Ali Shakeri, Wei Emma Zhang, Amin Beheshti et al.

Pre-trained Language Models (PLMs) have demonstrated impressive performance in various NLP tasks. However, traditional fine-tuning methods for leveraging PLMs for downstream tasks entail significant computational overhead. Prompt-tuning has emerged as an efficient alternative that involves prepending a limited number of parameters to the input sequence and only updating them while the PLM's parameters are frozen. However, this technique's prompts remain fixed for all inputs, reducing the model's flexibility. The Federated Learning (FL) technique has gained attention in recent years to address the growing concerns around data privacy. However, challenges such as communication and computation limitations of clients still need to be addressed. To mitigate these challenges, this paper introduces the Federated Dynamic Prompt Generator (FedDPG), which incorporates a dynamic prompt generator network to generate context-aware prompts based on the given input, ensuring flexibility and adaptability while prioritising data privacy in federated learning settings. Our experiments on three NLP benchmark datasets showcase that FedDPG outperforms the state-of-the-art parameter-efficient fine-tuning methods in terms of global model performance, and has significantly reduced the calculation time and the number of parameters to be sent through the FL network.

LGMay 26, 2025
Rethinking Gating Mechanism in Sparse MoE: Handling Arbitrary Modality Inputs with Confidence-Guided Gate

Liangwei Nathan Zheng, Wei Emma Zhang, Mingyu Guo et al.

Effectively managing missing modalities is a fundamental challenge in real-world multimodal learning scenarios, where data incompleteness often results from systematic collection errors or sensor failures. Sparse Mixture-of-Experts (SMoE) architectures have the potential to naturally handle multimodal data, with individual experts specializing in different modalities. However, existing SMoE approach often lacks proper ability to handle missing modality, leading to performance degradation and poor generalization in real-world applications. We propose ConfSMoE to introduce a two-stage imputation module to handle the missing modality problem for the SMoE architecture by taking the opinion of experts and reveal the insight of expert collapse from theoretical analysis with strong empirical evidence. Inspired by our theoretical analysis, ConfSMoE propose a novel expert gating mechanism by detaching the softmax routing score to task confidence score w.r.t ground truth signal. This naturally relieves expert collapse without introducing additional load balance loss function. We show that the insights of expert collapse aligns with other gating mechanism such as Gaussian and Laplacian gate. The proposed method is evaluated on four different real world dataset with three distinct experiment settings to conduct comprehensive analysis of ConfSMoE on resistance to missing modality and the impacts of proposed gating mechanism.

LGMar 3, 2025
PostHoc FREE Calibrating on Kolmogorov Arnold Networks

Wenhao Liang, Wei Emma Zhang, Lin Yue et al.

Kolmogorov Arnold Networks (KANs) are neural architectures inspired by the Kolmogorov Arnold representation theorem that leverage B Spline parameterizations for flexible, locally adaptive function approximation. Although KANs can capture complex nonlinearities beyond those modeled by standard MultiLayer Perceptrons (MLPs), they frequently exhibit miscalibrated confidence estimates manifesting as overconfidence in dense data regions and underconfidence in sparse areas. In this work, we systematically examine the impact of four critical hyperparameters including Layer Width, Grid Order, Shortcut Function, and Grid Range on the calibration of KANs. Furthermore, we introduce a novel TemperatureScaled Loss (TSL) that integrates a temperature parameter directly into the training objective, dynamically adjusting the predictive distribution during learning. Both theoretical analysis and extensive empirical evaluations on standard benchmarks demonstrate that TSL significantly reduces calibration errors, thereby improving the reliability of probabilistic predictions. Overall, our study provides actionable insights into the design of spline based neural networks and establishes TSL as a robust loss solution for enhancing calibration.

SIFeb 8, 2022
Understanding the Trustworthiness Management in the Social Internet of Things: A Survey

Subhash Sagar, Adnan Mahmood, Quan Z. Sheng et al.

The next generation of the Internet of Things (IoT) facilitates the integration of the notion of social networking into smart objects (i.e., things) in a bid to establish the social network of interconnected objects. This integration has led to the evolution of a promising and emerging paradigm of Social Internet of Things (SIoT), wherein the smart objects act as social objects and intelligently impersonate the social behaviour similar to that of humans. These social objects are capable of establishing social relationships with the other objects in the network and can utilize these relationships for service discovery. Trust plays a significant role to achieve the common goal of trustworthy collaboration and cooperation among the objects and provide systems' credibility and reliability. In SIoT, an untrustworthy object can disrupt the basic functionality of a service by delivering malicious messages and adversely affect the quality and reliability of the service. In this survey, we present a holistic view of trustworthiness management for SIoT. The essence of trust in various disciplines has been discussed along with the Trust in SIoT followed by a detailed study on trust management components in SIoT. Furthermore, we analyzed and compared the trust management schemes by primarily categorizing them into four groups in terms of their strengths, limitations, trust management components employed in each of the referred trust management schemes, and the performance of these studies vis-a-vis numerous trust evaluation dimensions. Finally, we have discussed the future research directions of the emerging paradigm of SIoT, particularly for trustworthiness management in SIoT.

CLSep 23, 2021
Dependency Structure for News Document Summarization

Congbo Ma, Wei Emma Zhang, Hu Wang et al.

In this work, we develop a neural network based model which leverages dependency parsing to capture cross-positional dependencies and grammatical structures. With the help of linguistic signals, sentence-level relations can be correctly captured, thus improving news documents summarization performance. Empirical studies demonstrate that this simple but effective method outperforms existing works on the benchmark dataset. Extensive analyses examine different settings and configurations of the proposed model which provide a good reference to the community.

CLJun 2, 2021
Conversational Question Answering: A Survey

Munazza Zaib, Wei Emma Zhang, Quan Z. Sheng et al.

Question answering (QA) systems provide a way of querying the information available in various formats including, but not limited to, unstructured and structured data in natural languages. It constitutes a considerable part of conversational artificial intelligence (AI) which has led to the introduction of a special research topic on Conversational Question Answering (CQA), wherein a system is required to understand the given context and then engages in multi-turn QA to satisfy the user's information needs. Whilst the focus of most of the existing research work is subjected to single-turn QA, the field of multi-turn QA has recently grasped attention and prominence owing to the availability of large-scale, multi-turn QA datasets and the development of pre-trained language models. With a good amount of models and research papers adding to the literature every year recently, there is a dire need of arranging and presenting the related work in a unified manner to streamline future research. This survey, therefore, is an effort to present a comprehensive review of the state-of-the-art research trends of CQA primarily based on reviewed papers from 2016-2021. Our findings show that there has been a trend shift from single-turn to multi-turn QA which empowers the field of Conversational AI from different perspectives. This survey is intended to provide an epitome for the research community with the hope of laying a strong foundation for the field of CQA.

HCApr 27, 2021
A Review of the Non-Invasive Techniques for Monitoring Different Aspects of Sleep

Zawar Hussain, Quan Z. Sheng, Wei Emma Zhang et al.

Quality sleep is very important for a healthy life. Nowadays, many people around the world are not getting enough sleep which is having negative impacts on their lifestyles. Studies are being conducted for sleep monitoring and have now become an important tool for understanding sleep behavior. The gold standard method for sleep analysis is polysomnography (PSG) conducted in a clinical environment but this method is both expensive and complex for long-term use. With the advancements in the field of sensors and the introduction of off-the-shelf technologies, unobtrusive solutions are becoming common as alternatives for in-home sleep monitoring. Various solutions have been proposed using both wearable and non-wearable methods which are cheap and easy to use for in-home sleep monitoring. In this paper, we present a comprehensive survey of the latest research works (2015 and after) conducted in various categories of sleep monitoring including sleep stage classification, sleep posture recognition, sleep disorders detection, and vital signs monitoring. We review the latest works done using the non-invasive approach and cover both wearable and non-wearable methods. We discuss the design approaches and key attributes of the work presented and provide an extensive analysis based on 10 key factors, to give a comprehensive overview of the recent developments and trends in all four categories of sleep monitoring. We also present some publicly available datasets for different categories of sleep monitoring. In the end, we discuss several open issues and provide future research directions in the area of sleep monitoring.

CLApr 22, 2021
A Short Survey of Pre-trained Language Models for Conversational AI-A NewAge in NLP

Munazza Zaib, Quan Z. Sheng, Wei Emma Zhang

Building a dialogue system that can communicate naturally with humans is a challenging yet interesting problem of agent-based computing. The rapid growth in this area is usually hindered by the long-standing problem of data scarcity as these systems are expected to learn syntax, grammar, decision making, and reasoning from insufficient amounts of task-specific dataset. The recently introduced pre-trained language models have the potential to address the issue of data scarcity and bring considerable advantages by generating contextualized word embeddings. These models are considered counterpart of ImageNet in NLP and have demonstrated to capture different facets of language such as hierarchical relations, long-term dependency, and sentiment. In this short survey paper, we discuss the recent progress made in the field of pre-trained language models. We also deliberate that how the strengths of these language models can be leveraged in designing more engaging and more eloquent conversational agents. This paper, therefore, intends to establish whether these pre-trained models can overcome the challenges pertinent to dialogue systems, and how their architecture could be exploited in order to overcome these challenges. Open challenges in the field of dialogue systems have also been deliberated.

CVApr 19, 2021
Kernel Adversarial Learning for Real-world Image Super-resolution

Hu Wang, Congbo Ma, Jianpeng Zhang et al.

Current deep image super-resolution (SR) approaches aim to restore high-resolution images from down-sampled images or by assuming degradation from simple Gaussian kernels and additive noises. However, these techniques only assume crude approximations of the real-world image degradation process, which should involve complex kernels and noise patterns that are difficult to model using simple assumptions. In this paper, we propose a more realistic process to synthesise low-resolution images for real-world image SR by introducing a new Kernel Adversarial Learning Super-resolution (KASR) framework. In the proposed framework, degradation kernels and noises are adaptively modelled rather than explicitly specified. Moreover, we also propose a high-frequency selective objective and an iterative supervision process to further boost the model SR reconstruction accuracy. Extensive experiments validate the effectiveness of the proposed framework on real-world datasets.

CRFeb 3, 2021
Trust Computational Heuristic for Social Internet of Things: A Machine Learning-based Approach

Subhash Sagar, Adnan Mahmood, Quan Z. Sheng et al.

The Internet of Things (IoT) is an evolving network of billions of interconnected physical objects, such as numerous sensors, smartphones, wearables, and embedded devices. These physical objects, generally referred to as the smart objects, when deployed in the real-world aggregates useful information from their surrounding environment. As-of-late, this notion of IoT has been extended to incorporate the social networking facets which have led to the promising paradigm of the `Social Internet of Things' (SIoT). In SIoT, the devices operate as an autonomous agent and provide an exchange of information and service discovery in an intelligent manner by establishing social relationships among them with respect to their owners. Trust plays an important role in establishing trustworthy relationships among the physical objects and reduces probable risks in the decision-making process. In this paper, a trust computational model is proposed to extract individual trust features in a SIoT environment. Furthermore, a machine learning-based heuristic is used to aggregate all the trust features in order to ascertain an aggregate trust score. Simulation results illustrate that the proposed trust-based model isolates the trustworthy and untrustworthy nodes within the network in an efficient manner.

CRFeb 3, 2021
Towards a Machine Learning-driven Trust Evaluation Model for Social Internet of Things: A Time-aware Approach

Subhash Sagar, Adnan Mahmood, Quan Z. Sheng et al.

The emerging paradigm of the Social Internet of Things (SIoT) has transformed the traditional notion of the Internet of Things (IoT) into a social network of billions of interconnected smart objects by integrating social networking facets into the same. In SIoT, objects can establish social relationships in an autonomous manner and interact with the other objects in the network based on their social behaviour. A fundamental problem that needs attention is establishing of these relationships in a reliable and trusted way, i.e., establishing trustworthy relationships and building trust amongst objects. In addition, it is also indispensable to ascertain and predict an object's behaviour in the SIoT network over a period of time. Accordingly, in this paper, we have proposed an efficient time-aware machine learning-driven trust evaluation model to address this particular issue. The envisaged model deliberates social relationships in terms of friendship and community-interest, and further takes into consideration the working relationships and cooperativeness (object-object interactions) as trust parameters to quantify the trustworthiness of an object. Subsequently, in contrast to the traditional weighted sum heuristics, a machine learning-driven aggregation scheme is delineated to synthesize these trust parameters to ascertain a single trust score. The experimental results demonstrate that the proposed model can efficiently segregates the trustworthy and untrustworthy objects within a network, and further provides the insight on how the trust of an object varies with time along with depicting the effect of each trust parameter on a trust score.

CLNov 10, 2020
Multi-document Summarization via Deep Learning Techniques: A Survey

Congbo Ma, Wei Emma Zhang, Mingyu Guo et al.

Multi-document summarization (MDS) is an effective tool for information aggregation that generates an informative and concise summary from a cluster of topic-related documents. Our survey, the first of its kind, systematically overviews the recent deep learning based MDS models. We propose a novel taxonomy to summarize the design strategies of neural networks and conduct a comprehensive summary of the state-of-the-art. We highlight the differences between various objective functions that are rarely discussed in the existing literature. Finally, we propose several future directions pertaining to this new and exciting field.

LGJun 14, 2020
Adversarial Attacks and Detection on Reinforcement Learning-Based Interactive Recommender Systems

Yuanjiang Cao, Xiaocong Chen, Lina Yao et al.

Adversarial attacks pose significant challenges for detecting adversarial attacks at an early stage. We propose attack-agnostic detection on reinforcement learning-based interactive recommendation systems. We first craft adversarial examples to show their diverse distributions and then augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data. Finally, we study the attack strength and frequency of adversarial examples and evaluate our model on standard datasets with multiple crafting methods. Our extensive experiments show that most adversarial attacks are effective, and both attack strength and attack frequency impact the attack performance. The strategically-timed attack achieves comparative attack performance with only 1/3 to 1/2 attack frequency. Besides, our black-box detector trained with one crafting method has the generalization ability over several crafting methods.

LGApr 28, 2020
Deep Conversational Recommender Systems: A New Frontier for Goal-Oriented Dialogue Systems

Dai Hoang Tran, Quan Z. Sheng, Wei Emma Zhang et al.

In recent years, the emerging topics of recommender systems that take advantage of natural language processing techniques have attracted much attention, and one of their applications is the Conversational Recommender System (CRS). Unlike traditional recommender systems with content-based and collaborative filtering approaches, CRS learns and models user's preferences through interactive dialogue conversations. In this work, we provide a summarization of the recent evolution of CRS, where deep learning approaches are applied to CRS and have produced fruitful results. We first analyze the research problems and present key challenges in the development of Deep Conversational Recommender Systems (DCRS), then present the current state of the field taken from the most recent researches, including the most common deep learning models that benefit DCRS. Finally, we discuss future directions for this vibrant area.

CVJun 11, 2019
Different Approaches for Human Activity Recognition: A Survey

Zawar Hussain, Michael Sheng, Wei Emma Zhang

Human activity recognition has gained importance in recent years due to its applications in various fields such as health, security and surveillance, entertainment, and intelligent environments. A significant amount of work has been done on human activity recognition and researchers have leveraged different approaches, such as wearable, object-tagged, and device-free, to recognize human activities. In this article, we present a comprehensive survey of the work conducted over the period 2010-2018 in various areas of human activity recognition with main focus on device-free solutions. The device-free approach is becoming very popular due to the fact that the subject is not required to carry anything, instead, the environment is tagged with devices to capture the required information. We propose a new taxonomy for categorizing the research work conducted in the field of activity recognition and divide the existing literature into three sub-areas: action-based, motion-based, and interaction-based. We further divide these areas into ten different sub-topics and present the latest research work in these sub-topics. Unlike previous surveys which focus only on one type of activities, to the best of our knowledge, we cover all the sub-areas in activity recognition and provide a comparison of the latest research work in these sub-areas. Specifically, we discuss the key attributes and design approaches for the work presented. Then we provide extensive analysis based on 10 important metrics, to give the reader, a complete overview of the state-of-the-art techniques and trends in different sub-areas of human activity recognition. In the end, we discuss open research issues and provide future research directions in the field of human activity recognition.

CLJan 21, 2019
Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey

Wei Emma Zhang, Quan Z. Sheng, Ahoud Alhazmi et al.

With the development of high computational devices, deep neural networks (DNNs), in recent years, have gained significant popularity in many Artificial Intelligence (AI) applications. However, previous efforts have shown that DNNs were vulnerable to strategically modified samples, named adversarial examples. These samples are generated with some imperceptible perturbations but can fool the DNNs to give false predictions. Inspired by the popularity of generating adversarial examples for image DNNs, research efforts on attacking DNNs for textual applications emerges in recent years. However, existing perturbation methods for images cannotbe directly applied to texts as text data is discrete. In this article, we review research works that address this difference and generatetextual adversarial examples on DNNs. We collect, select, summarize, discuss and analyze these works in a comprehensive way andcover all the related information to make the article self-contained. Finally, drawing on the reviewed literature, we provide further discussions and suggestions on this topic.

IRDec 25, 2018
Deep Autoencoder for Recommender Systems: Parameter Influence Analysis

Dai Hoang Tran, Zawar Hussain, Wei Emma Zhang et al.

Recommender systems have recently attracted many researchers in the deep learning community. The state-of-the-art deep neural network models used in recommender systems are typically multilayer perceptron and deep Autoencoder (DAE), among which DAE usually shows better performance due to its superior capability to reconstruct the inputs. However, we found existing DAE recommendation systems that have similar implementations on similar datasets result in vastly different parameter settings. In this work, we have built a flexible DAE model, named FlexEncoder that uses configurable parameters and unique features to analyse the parameter influences on the prediction accuracy of recommender systems. This will help us identify the best-performance parameters given a dataset. Extensive evaluation on the MovieLens datasets are conducted, which drives our conclusions on the influences of DAE parameters. Specifically, we find that DAE parameters strongly affect the prediction accuracy of the recommender systems, and the effect is transferable to similar datasets in a larger size. We open our code to public which could benefit both new users for DAE -- they can quickly understand how DAE works for recommendation systems, and experienced DAE users -- it easier for them to tune the parameters on different datasets.

IRDec 7, 2018
Internet of Things Search Engine: Concepts, Classification, and Open Issues

Nguyen Khoi Tran, Quan Z. Sheng, M. Ali Babar et al.

This article focuses on the complicated yet still relatively immature area of the Internet of Things Search Engines (IoTSE). It introduces related concepts of IoTSE and a model called meta-path to describe and classify IoTSE systems based on their functionality. Based on these concepts, we have organized the research and development efforts on IoTSE into eight groups and presented the representative works in each group. The concepts and ideas presented in this article are generated from an extensive structured study on over 200 works spanning over one decade of IoTSE research and development.

SENov 30, 2018
ContextServ: Towards Model-Driven Development of Context-AwareWeb Services

Quan Z. Sheng, Jian Yu, Hanchuan Xu et al.

In the era of Web of Things and Services, Context-aware Web Services (CASs) are emerging as an important technology for building innovative context-aware applications. CASs enable the information integration from both the physical and virtual world, which affects human living. However, it is challenging to build CASs, due to the lack of context provisioning management approach and limited generic approach for formalizing the development process. We therefore propose ContextServ, a platform that uses a model-driven approach to support the full life cycle of CASs development, hence offering significant design and management flexibility. ContextServ implements a proposed UML-based modelling language ContextUML to support multiple modelling languages. It also supports dynamic adaptation of WS-BPEL based context-aware composite services by weaving context-aware rules into the process. Extensive experimental evaluations on ContextServ and its components showcase that ContextServ can support effective development and efficient execution of context-aware Web services.

IRJul 23, 2016
Searching for the Internet of Things on the Web: Where It Is and What It Looks Like

Ali Shemshadi, Quan Z. Sheng, Wei Emma Zhang et al.

The Internet of Things (IoT), in general, is a compelling paradigm that aims to connect everyday objects to the Internet. Nowadays, IoT is considered as one of the main technologies which contribute towards reshaping our daily lives in the next decade. IoT unlocks many exciting new opportunities in a variety of applications in research and industry domains. However, many have complained about the absence of the real-world IoT data. Unsurprisingly, a common question that arises regularly nowadays is "Does the IoT already exist?". So far, little has been known about the real-world situation on IoT, its attributes, the presentation of data and user interests. To answer this question, in this work, we conduct an in-depth analytical investigation on real IoT data. More specifically, we identify IoT data sources over the Web and develop a crawler engine to collect large-scale real-world IoT data for the first time. We make the results of our work available to the public in order to assist the community in the future research. In particular, we collect the data of nearly two million Internet connected objects and study trends in IoT using a real-world query set from an IoT search engine. Based on the collected data and our analysis, we identify the typical characteristics of IoT data. The most intriguing finding of our study is that IoT data is mainly disseminated using Web Mapping while the emerging IoT solutions such as the Web of Things, are currently not well adopted. On top of our findings, we further discuss future challenges and open research problems in the IoT area.