Wei-Shinn Ku

LG
h-index8
19papers
686citations
Novelty48%
AI Score56

19 Papers

AIApr 2, 2022Code
RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques

Bo Hui, Wenlu Wang, Jiao Yu et al.

People spend a significant amount of time in indoor spaces (e.g., office buildings, subway systems, etc.) in their daily lives. Therefore, it is important to develop efficient indoor spatial query algorithms for supporting various location-based applications. However, indoor spaces differ from outdoor spaces because users have to follow the indoor floor plan for their movements. In addition, positioning in indoor environments is mainly based on sensing devices (e.g., RFID readers) rather than GPS devices. Consequently, we cannot apply existing spatial query evaluation techniques devised for outdoor environments for this new challenge. Because Bayesian filtering techniques can be employed to estimate the state of a system that changes over time using a sequence of noisy measurements made on the system, in this research, we propose the Bayesian filtering-based location inference methods as the basis for evaluating indoor spatial queries with noisy RFID raw data. Furthermore, two novel models, indoor walking graph model and anchor point indexing model, are created for tracking object locations in indoor environments. Based on the inference method and tracking models, we develop innovative indoor range and k nearest neighbor (kNN) query algorithms. We validate our solution through use of both synthetic data and real-world data. Our experimental results show that the proposed algorithms can evaluate indoor spatial queries effectively and efficiently. We open-source the code, data, and floor plan at https://github.com/DataScienceLab18/IndoorToolKit.

24.6LGMay 30
scBatchProx: Federated-Inspired Refinement for Stable Cell-Type Discriminability under Heterogeneous Batch Compositions

Quang-Huy Nguyen, Jiaqi Wang, Wei-Shinn Ku

Single-cell integration workflows often construct low-dimensional cell embeddings and then refine them with post-hoc methods to reduce batch effects. This refinement process can become unstable when cell-type compositions vary across batches, with some populations underrepresented or absent in particular batches. The problem becomes more consequential in dynamic single-cell data systems, where newly acquired batches can change both technical conditions and cell-type composition. Such instability can reduce downstream cell-type classification performance and weaken stability under imbalance perturbations. We introduce scBatchProx, a lightweight post-hoc refinement method for stabilizing single-cell latent embeddings in these heterogeneous and evolving settings. scBatchProx operates directly on precomputed embeddings and treats each batch or study as a client in a federated-inspired optimization procedure. A batch-conditioned FiLM adapter learns local latent updates, while proximal and identity-preserving regularization keep these updates conservative. Experiments on multi-batch and cross-study single-cell datasets show that scBatchProx improves downstream cell-type classification across different upstream embeddings. In controlled imbalance perturbations, scBatchProx maintains more stable affected-cell-type F1 when selected populations are downsampled or ablated from one batch. In cumulative retraining and continual integration settings, scBatchProx remains effective as new datasets arrive over time. Together, these results suggest that conservative, federated-inspired refinement can help maintain stable single-cell embeddings as batch compositions change across datasets and over time.

ROAug 15, 2022
Multi-modal Transformer Path Prediction for Autonomous Vehicle

Chia Hong Tseng, Jie Zhang, Min-Te Sun et al.

Reasoning about vehicle path prediction is an essential and challenging problem for the safe operation of autonomous driving systems. There exist many research works for path prediction. However, most of them do not use lane information and are not based on the Transformer architecture. By utilizing different types of data collected from sensors equipped on the self-driving vehicles, we propose a path prediction system named Multi-modal Transformer Path Prediction (MTPP) that aims to predict long-term future trajectory of target agents. To achieve more accurate path prediction, the Transformer architecture is adopted in our model. To better utilize the lane information, the lanes which are in opposite direction to target agent are not likely to be taken by the target agent and are consequently filtered out. In addition, consecutive lane chunks are combined to ensure the lane input to be long enough for path prediction. An extensive evaluation is conducted to show the efficacy of the proposed system using nuScene, a real-world trajectory forecasting dataset.

LGMar 7, 2024Code
A Survey of Lottery Ticket Hypothesis

Bohan Liu, Zijie Zhang, Peixiong He et al.

The Lottery Ticket Hypothesis (LTH) states that a dense neural network model contains a highly sparse subnetwork (i.e., winning tickets) that can achieve even better performance than the original model when trained in isolation. While LTH has been proved both empirically and theoretically in many works, there still are some open issues, such as efficiency and scalability, to be addressed. Also, the lack of open-source frameworks and consensual experimental setting poses a challenge to future research on LTH. We, for the first time, examine previous research and studies on LTH from different perspectives. We also discuss issues in existing works and list potential directions for further exploration. This survey aims to provide an in-depth look at the state of LTH and develop a duly maintained platform to conduct experiments and compare with the most updated baselines.

76.4LGMay 15
On the Fragility of Data Attribution When Learning Is Distributed

Xian Gao, Bo Hui, Min-Te Sun et al.

Data attribution has become an important component of pricing, auditing, and governance in machine learning pipelines, yet most attribution methods implicitly assume that attribution values faithfully reflect participants' contributions. We show that this assumption can fail: a single participant in a standard distributed training workflow can substantially inflate its measured attribution value while preserving global utility. Our attribution-first attack uses latent optimization to inject small synthetic batches that preserve utility while exploiting non-IID label coverage and evaluator sensitivities. Across datasets, models, and multiple marginal-utility evaluators, the attack consistently increases the adversary's attribution value and reshapes the relative attribution structure among benign clients without degrading accuracy or triggering geometry-based defenses. These results show that attribution itself forms a new attack surface and motivate the development of attribution-robust and incentive-compatible scoring mechanisms.

LGMay 31, 2025Code
Optimized Local Updates in Federated Learning via Reinforcement Learning

Ali Murad, Bo Hui, Wei-Shinn Ku

Federated Learning (FL) is a distributed framework for collaborative model training over large-scale distributed data, enabling higher performance while maintaining client data privacy. However, the nature of model aggregation at the centralized server can result in a performance drop in the presence of non-IID data across different clients. We remark that training a client locally on more data than necessary does not benefit the overall performance of all clients. In this paper, we devise a novel framework that leverages a Deep Reinforcement Learning (DRL) agent to select an optimized amount of data necessary to train a client model without oversharing information with the server. Starting without awareness of the client's performance, the DRL agent utilizes the change in training loss as a reward signal and learns to optimize the amount of training data necessary for improving the client's performance. Specifically, after each aggregation round, the DRL algorithm considers the local performance as the current state and outputs the optimized weights for each class, in the training data, to be used during the next round of local training. In doing so, the agent learns a policy that creates an optimized partition of the local training dataset during the FL rounds. After FL, the client utilizes the entire local training dataset to further enhance its performance on its own data distribution, mitigating the non-IID effects of aggregation. Through extensive experiments, we demonstrate that training FL clients through our algorithm results in superior performance on multiple benchmark datasets and FL frameworks. Our code is available at https://github.com/amuraddd/optimized_client_training.git.

30.8LGMay 7
FedeKD: Energy-Based Gating for Robust Federated Knowledge Distillation under Heterogeneous Settings

Quang-Huy Nguyen, Jiaqi Wang, Wei-shinn Ku

Federated learning (FL) operates in heterogeneous environments, where variations in data distributions and asymmetric model design often result in negative transfer. While federated knowledge distillation (FKD) avoids direct model parameter sharing, existing methods typically rely on public datasets or assume that transferred knowledge is uniformly reliable, which limits their robustness in practice. This paper presents FedeKD, a reliability-aware FKD framework that makes sample-wise trust estimation an explicit component of knowledge transfer, without relying on additional public data. Each client maintains a high-capacity private model for local learning and a lightweight shared proxy model for cross-client knowledge exchange. During training, proxy models are aggregated on the server to form a global proxy, which is then used to guide updates of the private models. At the core of FedeKD is an energy-based gating mechanism that converts task-specific private-proxy disagreement into sample-wise trust weights for backward distillation. This mechanism enables sample-wise weighting of knowledge transfer, where the proxy model contributes more to reliable samples while down-weighting unreliable ones. Extensive experiments on six real-world datasets demonstrate that FedeKD significantly reduces negative transfer under heterogeneous settings while maintaining strong predictive performance.

LGFeb 26
Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity

Quang-Huy Nguyen, Jiaqi Wang, Wei-Shinn Ku

Federated learning (FL) faces challenges in uncertainty quantification (UQ). Without reliable UQ, FL systems risk deploying overconfident models at under-resourced agents, leading to silent local failures despite seemingly satisfactory global performance. Existing federated UQ approaches often address data heterogeneity or model heterogeneity in isolation, overlooking their joint effect on coverage reliability across agents. Conformal prediction is a widely used distribution-free UQ framework, yet its applications in heterogeneous FL settings remains underexplored. We provide FedWQ-CP, a simple yet effective approach that balances empirical coverage performance with efficiency at both global and agent levels under the dual heterogeneity. FedWQ-CP performs agent-server calibration in a single communication round. On each agent, conformity scores are computed on calibration data and a local quantile threshold is derived. Each agent then transmits only its quantile threshold and calibration sample size to the server. The server simply aggregates these thresholds through a weighted average to produce a global threshold. Experimental results on seven public datasets for both classification and regression demonstrate that FedWQ-CP empirically maintains agent-wise and global coverage while producing the smallest prediction sets or intervals.

AISep 14, 2025
Knowledge-Guided Adaptive Mixture of Experts for Precipitation Prediction

Chen Jiang, Kofi Osei, Sai Deepthi Yeddula et al.

Accurate precipitation forecasting is indispensable in agriculture, disaster management, and sustainable strategies. However, predicting rainfall has been challenging due to the complexity of climate systems and the heterogeneous nature of multi-source observational data, including radar, satellite imagery, and surface-level measurements. The multi-source data vary in spatial and temporal resolution, and they carry domain-specific features, making it challenging for effective integration in conventional deep learning models. Previous research has explored various machine learning techniques for weather prediction; however, most struggle with the integration of data with heterogeneous modalities. To address these limitations, we propose an Adaptive Mixture of Experts (MoE) model tailored for precipitation rate prediction. Each expert within the model specializes in a specific modality or spatio-temporal pattern. We also incorporated a dynamic router that learns to assign inputs to the most relevant experts. Our results show that this modular design enhances predictive accuracy and interpretability. In addition to the modeling framework, we introduced an interactive web-based visualization tool that enables users to intuitively explore historical weather patterns over time and space. The tool was designed to support decision-making for stakeholders in climate-sensitive sectors. We evaluated our approach using a curated multimodal climate dataset capturing real-world conditions during Hurricane Ian in 2022. The benchmark results show that the Adaptive MoE significantly outperformed all the baselines.

CVMar 19, 2024
Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset

Qi Li, Tzu-Chen Chiu, Hsiang-Wei Huang et al.

In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications, particularly in sports analysis, where the precise decomposition of activities and the distinction of subtly different actions are crucial. Existing datasets like UCF101, HMDB51, and Kinetics have offered a diverse range of video data for various scenarios. However, there's an increasing need for fine-grained video datasets that capture detailed categorizations and nuances within broader action categories. In this paper, we introduce the VideoBadminton dataset derived from high-quality badminton footage. Through an exhaustive evaluation of leading methodologies on this dataset, this study aims to advance the field of action recognition, particularly in badminton sports. The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions. The insights gained from these evaluations are expected to catalyze further research in action comprehension, especially within sports contexts.

LGMay 10, 2023
Dynamic Graph Representation Learning for Depression Screening with Transformer

Ai-Te Kuo, Haiquan Chen, Yu-Hsuan Kuo et al.

Early detection of mental disorder is crucial as it enables prompt intervention and treatment, which can greatly improve outcomes for individuals suffering from debilitating mental affliction. The recent proliferation of mental health discussions on social media platforms presents research opportunities to investigate mental health and potentially detect instances of mental illness. However, existing depression detection methods are constrained due to two major limitations: (1) the reliance on feature engineering and (2) the lack of consideration for time-varying factors. Specifically, these methods require extensive feature engineering and domain knowledge, which heavily rely on the amount, quality, and type of user-generated content. Moreover, these methods ignore the important impact of time-varying factors on depression detection, such as the dynamics of linguistic patterns and interpersonal interactive behaviors over time on social media (e.g., replies, mentions, and quote-tweets). To tackle these limitations, we propose an early depression detection framework, ContrastEgo treats each user as a dynamic time-evolving attributed graph (ego-network) and leverages supervised contrastive learning to maximize the agreement of users' representations at different scales while minimizing the agreement of users' representations to differentiate between depressed and control groups. ContrastEgo embraces four modules, (1) constructing users' heterogeneous interactive graphs, (2) extracting the representations of users' interaction snapshots using graph neural networks, (3) modeling the sequences of snapshots using attention mechanism, and (4) depression detection using contrastive learning. Extensive experiments on Twitter data demonstrate that ContrastEgo significantly outperforms the state-of-the-art methods in terms of all the effectiveness metrics in various experimental settings.

LGMay 3, 2023
Rethinking Graph Lottery Tickets: Graph Sparsity Matters

Bo Hui, Da Yan, Xiaolong Ma et al.

Lottery Ticket Hypothesis (LTH) claims the existence of a winning ticket (i.e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance to the original dense network. A recent work, called UGS, extended LTH to prune graph neural networks (GNNs) for effectively accelerating GNN inference. UGS simultaneously prunes the graph adjacency matrix and the model weights using the same masking mechanism, but since the roles of the graph adjacency matrix and the weight matrices are very different, we find that their sparsifications lead to different performance characteristics. Specifically, we find that the performance of a sparsified GNN degrades significantly when the graph sparsity goes beyond a certain extent. Therefore, we propose two techniques to improve GNN performance when the graph sparsity is high. First, UGS prunes the adjacency matrix using a loss formulation which, however, does not properly involve all elements of the adjacency matrix; in contrast, we add a new auxiliary loss head to better guide the edge pruning by involving the entire adjacency matrix. Second, by regarding unfavorable graph sparsification as adversarial data perturbations, we formulate the pruning process as a min-max optimization problem to gain the robustness of lottery tickets when the graph sparsity is high. We further investigate the question: Can the "retrainable" winning ticket of a GNN be also effective for graph transferring learning? We call it the transferable graph lottery ticket (GLT) hypothesis. Extensive experiments were conducted which demonstrate the superiority of our proposed sparsification method over UGS, and which empirically verified our transferable GLT hypothesis.

LGOct 14, 2021
MGC: A Complex-Valued Graph Convolutional Network for Directed Graphs

Jie Zhang, Bo Hui, Po-Wei Harn et al.

Recent advancements in Graph Neural Networks have led to state-of-the-art performance on graph representation learning. However, the majority of existing works process directed graphs by symmetrization, which causes loss of directional information. To address this issue, we introduce the magnetic Laplacian, a discrete Schrödinger operator with magnetic field, which preserves edge directionality by encoding it into a complex phase with an electric charge parameter. By adopting a truncated variant of PageRank named Linear- Rank, we design and build a low-pass filter for homogeneous graphs and a high-pass filter for heterogeneous graphs. In this work, we propose a complex-valued graph convolutional network named Magnetic Graph Convolutional network (MGC). With the corresponding complex-valued techniques, we ensure our model will be degenerated into real-valued when the charge parameter is in specific values. We test our model on several graph datasets including directed homogeneous and heterogeneous graphs. The experimental results demonstrate that MGC is fast, powerful, and widely applicable.

LGDec 22, 2020
Deep Multi-attribute Graph Representation Learning on Protein Structures

Tian Xia, Wei-Shinn Ku

Graphs as a type of data structure have recently attracted significant attention. Representation learning of geometric graphs has achieved great success in many fields including molecular, social, and financial networks. It is natural to present proteins as graphs in which nodes represent the residues and edges represent the pairwise interactions between residues. However, 3D protein structures have rarely been studied as graphs directly. The challenges include: 1) Proteins are complex macromolecules composed of thousands of atoms making them much harder to model than micro-molecules. 2) Capturing the long-range pairwise relations for protein structure modeling remains under-explored. 3) Few studies have focused on learning the different attributes of proteins together. To address the above challenges, we propose a new graph neural network architecture to represent the proteins as 3D graphs and predict both distance geometric graph representation and dihedral geometric graph representation together. This gives a significant advantage because this network opens a new path from the sequence to structure. We conducted extensive experiments on four different datasets and demonstrated the effectiveness of the proposed method.

CLAug 28, 2019
SpatialNLI: A Spatial Domain Natural Language Interface to Databases Using Spatial Comprehension

Jingjing Li, Wenlu Wang, Wei-Shinn Ku et al.

A natural language interface (NLI) to databases is an interface that translates a natural language question to a structured query that is executable by database management systems (DBMS). However, an NLI that is trained in the general domain is hard to apply in the spatial domain due to the idiosyncrasy and expressiveness of the spatial questions. Inspired by the machine comprehension model, we propose a spatial comprehension model that is able to recognize the meaning of spatial entities based on the semantics of the context. The spatial semantics learned from the spatial comprehension model is then injected to the natural language question to ease the burden of capturing the spatial-specific semantics. With our spatial comprehension model and information injection, our NLI for the spatial domain, named SpatialNLI, is able to capture the semantic structure of the question and translate it to the corresponding syntax of an executable query accurately. We also experimentally ascertain that SpatialNLI outperforms state-of-the-art methods.

CVNov 28, 2018
Strike (with) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects

Michael A. Alcorn, Qi Li, Zhitao Gong et al.

Despite excellent performance on stationary test sets, deep neural networks (DNNs) can fail to generalize to out-of-distribution (OoD) inputs, including natural, non-adversarial ones, which are common in real-world settings. In this paper, we present a framework for discovering DNN failures that harnesses 3D renderers and 3D models. That is, we estimate the parameters of a 3D renderer that cause a target DNN to misbehave in response to the rendered image. Using our framework and a self-assembled dataset of 3D objects, we investigate the vulnerability of DNNs to OoD poses of well-known objects in ImageNet. For objects that are readily recognized by DNNs in their canonical poses, DNNs incorrectly classify 97% of their pose space. In addition, DNNs are highly sensitive to slight pose perturbations. Importantly, adversarial poses transfer across models and datasets. We find that 99.9% and 99.4% of the poses misclassified by Inception-v3 also transfer to the AlexNet and ResNet-50 image classifiers trained on the same ImageNet dataset, respectively, and 75.5% transfer to the YOLOv3 object detector trained on MS COCO.

AISep 7, 2018
A Transfer-Learnable Natural Language Interface for Databases

Wenlu Wang, Yingtao Tian, Hongyu Xiong et al.

Relational database management systems (RDBMSs) are powerful because they are able to optimize and answer queries against any relational database. A natural language interface (NLI) for a database, on the other hand, is tailored to support that specific database. In this work, we introduce a general purpose transfer-learnable NLI with the goal of learning one model that can be used as NLI for any relational database. We adopt the data management principle of separating data and its schema, but with the additional support for the idiosyncrasy and complexity of natural languages. Specifically, we introduce an automatic annotation mechanism that separates the schema and the data, where the schema also covers knowledge about natural language. Furthermore, we propose a customized sequence model that translates annotated natural language queries to SQL statements. We show in experiments that our approach outperforms previous NLI methods on the WikiSQL dataset and the model we learned can be applied to another benchmark dataset OVERNIGHT without retraining.

CLJan 22, 2018
Adversarial Texts with Gradient Methods

Zhitao Gong, Wenlu Wang, Bo Li et al.

Adversarial samples for images have been extensively studied in the literature. Among many of the attacking methods, gradient-based methods are both effective and easy to compute. In this work, we propose a framework to adapt the gradient attacking methods on images to text domain. The main difficulties for generating adversarial texts with gradient methods are i) the input space is discrete, which makes it difficult to accumulate small noise directly in the inputs, and ii) the measurement of the quality of the adversarial texts is difficult. We tackle the first problem by searching for adversarials in the embedding space and then reconstruct the adversarial texts via nearest neighbor search. For the latter problem, we employ the Word Mover's Distance (WMD) to quantify the quality of adversarial texts. Through extensive experiments on three datasets, IMDB movie reviews, Reuters-2 and Reuters-5 newswires, we show that our framework can leverage gradient attacking methods to generate very high-quality adversarial texts that are only a few words different from the original texts. There are many cases where we can change one word to alter the label of the whole piece of text. We successfully incorporate FGM and DeepFool into our framework. In addition, we empirically show that WMD is closely related to the quality of adversarial texts.

LGApr 17, 2017
Adversarial and Clean Data Are Not Twins

Zhitao Gong, Wenlu Wang, Wei-Shinn Ku

Adversarial attack has cast a shadow on the massive success of deep neural networks. Despite being almost visually identical to the clean data, the adversarial images can fool deep neural networks into wrong predictions with very high confidence. In this paper, however, we show that we can build a simple binary classifier separating the adversarial apart from the clean data with accuracy over 99%. We also empirically show that the binary classifier is robust to a second-round adversarial attack. In other words, it is difficult to disguise adversarial samples to bypass the binary classifier. Further more, we empirically investigate the generalization limitation which lingers on all current defensive methods, including the binary classifier approach. And we hypothesize that this is the result of intrinsic property of adversarial crafting algorithms.