LGJan 1, 2023
An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal EffectsThanh Vinh Vo, Arnab Bhattacharyya, Young Lee et al.
We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have different distributions; the causal effects are independently and systematically incorporated. The proposed method estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. The heterogeneous causal effects can be estimated with no sharing of the raw training data among the sources, thus minimizing the risk of privacy leak. We also provide minimax lower bounds to assess the quality of the parameters learned from the disparate sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.
LGAug 24, 2023
Federated Causal Inference from Observational DataThanh Vinh Vo, Young lee, Tze-Yun Leong
Decentralized data sources are prevalent in real-world applications, posing a formidable challenge for causal inference. These sources cannot be consolidated into a single entity owing to privacy constraints. The presence of dissimilar data distributions and missing values within them can potentially introduce bias to the causal estimands. In this article, we propose a framework to estimate causal effects from decentralized data sources. The proposed framework avoid exchanging raw data among the sources, thus contributing towards privacy-preserving causal learning. Three instances of the proposed framework are introduced to estimate causal effects across a wide range of diverse scenarios within a federated setting. (1) FedCI: a Bayesian framework based on Gaussian processes for estimating causal effects from federated observational data sources. It estimates the posterior distributions of the causal effects to compute the higher-order statistics that capture the uncertainty. (2) CausalRFF: an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. It estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. (3) CausalFI: a new approach for federated causal inference from incomplete data, enabling the estimation of causal effects from multiple decentralized and incomplete data sources. It accounts for the missing data under the missing at random assumption, while also estimating higher-order statistics of the causal estimands. The proposed federated framework and its instances are an important step towards a privacy-preserving causal learning model.
LGAug 6, 2024
Highly Efficient Self-Adaptive Reward Shaping for Reinforcement LearningHaozhe Ma, Zhengding Luo, Thanh Vinh Vo et al.
Reward shaping is a technique in reinforcement learning that addresses the sparse-reward problem by providing more frequent and informative rewards. We introduce a self-adaptive and highly efficient reward shaping mechanism that incorporates success rates derived from historical experiences as shaped rewards. The success rates are sampled from Beta distributions, which dynamically evolve from uncertain to reliable values as data accumulates. Initially, the shaped rewards exhibit more randomness to encourage exploration, while over time, the increasing certainty enhances exploitation, naturally balancing exploration and exploitation. Our approach employs Kernel Density Estimation (KDE) combined with Random Fourier Features (RFF) to derive the Beta distributions, providing a computationally efficient, non-parametric, and learning-free solution for high-dimensional continuous state spaces. Our method is validated on various tasks with extremely sparse rewards, demonstrating notable improvements in sample efficiency and convergence stability over relevant baselines.
LGAug 20, 2024
Centralized Reward Agent for Knowledge Sharing and Transfer in Multi-Task Reinforcement LearningHaozhe Ma, Zhengding Luo, Thanh Vinh Vo et al.
Reward shaping is effective in addressing the sparse-reward challenge in reinforcement learning (RL) by providing immediate feedback through auxiliary, informative rewards. Based on the reward shaping strategy, we propose a novel multi-task reinforcement learning framework that integrates a centralized reward agent (CRA) and multiple distributed policy agents. The CRA functions as a knowledge pool, aimed at distilling knowledge from various tasks and distributing it to individual policy agents to improve learning efficiency. Specifically, the shaped rewards serve as a straightforward metric for encoding knowledge. This framework not only enhances knowledge sharing across established tasks but also adapts to new tasks by transferring meaningful reward signals. We validate the proposed method on both discrete and continuous domains, including the representative Meta-World benchmark, demonstrating its robustness in multi-task sparse-reward settings and its effective transferability to unseen tasks.
CVJul 20, 2024
Decoupled Prompt-Adapter Tuning for Continual Activity RecognitionDi Fu, Thanh Vinh Vo, Haozhe Ma et al.
Action recognition technology plays a vital role in enhancing security through surveillance systems, enabling better patient monitoring in healthcare, providing in-depth performance analysis in sports, and facilitating seamless human-AI collaboration in domains such as manufacturing and assistive technologies. The dynamic nature of data in these areas underscores the need for models that can continuously adapt to new video data without losing previously acquired knowledge, highlighting the critical role of advanced continual action recognition. To address these challenges, we propose Decoupled Prompt-Adapter Tuning (DPAT), a novel framework that integrates adapters for capturing spatial-temporal information and learnable prompts for mitigating catastrophic forgetting through a decoupled training strategy. DPAT uniquely balances the generalization benefits of prompt tuning with the plasticity provided by adapters in pretrained vision models, effectively addressing the challenge of maintaining model performance amidst continuous data evolution without necessitating extensive finetuning. DPAT consistently achieves state-of-the-art performance across several challenging action recognition benchmarks, thus demonstrating the effectiveness of our model in the domain of continual action recognition.
58.8LGMay 13
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement LearningJing Yu Lim, Rushi Shah, Zarif Ikram et al.
Diffusion world models have recently become competitive for online model-based reinforcement learning, but current approaches expose a tension: pixel diffusion is effective but computationally expensive while the latest latent diffusion approach improves efficiency yet performs subpar. The latter also relies on separately trained latents rather than the end-to-end world-model objectives that have driven much of modern MBRL progress. In particular, JEPA-style predictive representation learning has emerged as an especially promising direction for world modeling and MBRL. Concurrently, diffusion-style objectives have gained traction across multiple domains, with iterative refinement as a promising approach for multimodal and stochastic targets. Taken together, these trends motivate Joint Embedding DIffusion (JEDI), the first online end-to-end latent diffusion world model. JEDI learns its latent space directly from the diffusion denoising loss with a JEPA framework, using denoising to learn and predict future latents rather than relying on reconstruction and pretrained models. We provide a theoretical motivation showing that conventional JEPA objectives induce a predictive information bottleneck, and that conditional diffusion denoising admits a closely related predictive-compression decomposition. Empirically, JEDI is competitive on Atari100k and outperforms the baseline with seperately trained latents where directly comparable. Relative to the pixel diffusion baseline, JEDI uses 43% less VRAM, over 3$\times$ faster world-model sampling, and 2.5$\times$ faster training. JEDI also exhibits a markedly different task-level performance profile from the pixel baseline, suggesting that end-to-end predictive latents change more than compute alone.
LGJun 10, 2025
Exploration by Random Reward PerturbationHaozhe Ma, Guoji Fu, Zhengding Luo et al.
We introduce Random Reward Perturbation (RRP), a novel exploration strategy for reinforcement learning (RL). Our theoretical analyses demonstrate that adding zero-mean noise to environmental rewards effectively enhances policy diversity during training, thereby expanding the range of exploration. RRP is fully compatible with the action-perturbation-based exploration strategies, such as $ε$-greedy, stochastic policies, and entropy regularization, providing additive improvements to exploration effects. It is general, lightweight, and can be integrated into existing RL algorithms with minimal implementation effort and negligible computational overhead. RRP establishes a theoretical connection between reward shaping and noise-driven exploration, highlighting their complementary potential. Experiments show that RRP significantly boosts the performance of Proximal Policy Optimization and Soft Actor-Critic, achieving higher sample efficiency and escaping local optima across various tasks, under both sparse and dense reward scenarios.
LGJun 5, 2025
Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-CriticThanh Vinh Vo, Young Lee, Haozhe Ma et al.
Hidden confounders that influence both states and actions can bias policy learning in reinforcement learning (RL), leading to suboptimal or non-generalizable behavior. Most RL algorithms ignore this issue, learning policies from observational trajectories based solely on statistical associations rather than causal effects. We propose DoSAC (Do-Calculus Soft Actor-Critic with Backdoor Adjustment), a principled extension of the SAC algorithm that corrects for hidden confounding via causal intervention estimation. DoSAC estimates the interventional policy $π(a | \mathrm{do}(s))$ using the backdoor criterion, without requiring access to true confounders or causal labels. To achieve this, we introduce a learnable Backdoor Reconstructor that infers pseudo-past variables (previous state and action) from the current state to enable backdoor adjustment from observational data. This module is integrated into a soft actor-critic framework to compute both the interventional policy and its entropy. Empirical results on continuous control benchmarks show that DoSAC outperforms baselines under confounded settings, with improved robustness, generalization, and policy reliability.
LGMay 31, 2021
Adaptive Multi-Source Causal InferenceThanh Vinh Vo, Pengfei Wei, Trong Nghia Hoang et al.
Data scarcity is a tremendous challenge in causal effect estimation. In this paper, we propose to exploit additional data sources to facilitate estimating causal effects in the target population. Specifically, we leverage additional source datasets which share similar causal mechanisms with the target observations to help infer causal effects of the target population. We propose three levels of knowledge transfer, through modelling the outcomes, treatments, and confounders. To achieve consistent positive transfer, we introduce learnable parametric transfer factors to adaptively control the transfer strength, and thus achieving a fair and balanced knowledge transfer between the sources and the target. The proposed method can infer causal effects in the target population without prior knowledge of data discrepancy between the additional data sources and the target. Experiments on both synthetic and real-world datasets show the effectiveness of the proposed method as compared with recent baselines.
MEMay 31, 2021
Federated Estimation of Causal Effects from Observational DataThanh Vinh Vo, Trong Nghia Hoang, Young Lee et al.
Many modern applications collect data that comes in federated spirit, with data kept locally and undisclosed. Till date, most insight into the causal inference requires data to be stored in a central repository. We present a novel framework for causal inference with federated data sources. We assess and integrate local causal effects from different private data sources without centralizing them. Then, the treatment effects on subjects from observational data using a non-parametric reformulation of the classical potential outcomes framework is estimated. We model the potential outcomes as a random function distributed by Gaussian processes, whose defining parameters can be efficiently learned from multiple data sources, respecting privacy constraints. We demonstrate the promise and efficiency of the proposed approach through a set of simulated and real-world benchmark examples.
AIAug 8, 2020
Hierarchical Reinforcement Learning in StarCraft II with Human Expertise in Subgoals SelectionXinyi Xu, Tiancheng Huang, Pengfei Wei et al.
This work is inspired by recent advances in hierarchical reinforcement learning (HRL) (Barto and Mahadevan 2003; Hengst 2010), and improvements in learning efficiency from heuristic-based subgoal selection, experience replay (Lin 1993; Andrychowicz et al. 2017), and task-based curriculum learning (Bengio et al. 2009; Zaremba and Sutskever 2014). We propose a new method to integrate HRL, experience replay and effective subgoal selection through an implicit curriculum design based on human expertise to support sample-efficient learning and enhance interpretability of the agent's behavior. Human expertise remains indispensable in many areas such as medicine (Buch, Ahmed, and Maruthappu 2018) and law (Cath 2018), where interpretability, explainability and transparency are crucial in the decision making process, for ethical and legal reasons. Our method simplifies the complex task sets for achieving the overall objectives by decomposing them into subgoals at different levels of abstraction. Incorporating relevant subjective knowledge also significantly reduces the computational resources spent in exploration for RL, especially in high speed, changing, and complex environments where the transition dynamics cannot be effectively learned and modelled in a short time. Experimental results in two StarCraft II (SC2) (Vinyals et al. 2017) minigames demonstrate that our method can achieve better sample efficiency than flat and end-to-end RL methods, and provides an effective method for explaining the agent's performance.
LGMay 6, 2020
Subdomain Adaptation with Manifolds Discrepancy AlignmentPengfei Wei, Yiping Ke, Xinghua Qu et al.
Reducing domain divergence is a key step in transfer learning problems. Existing works focus on the minimization of global domain divergence. However, two domains may consist of several shared subdomains, and differ from each other in each subdomain. In this paper, we take the local divergence of subdomains into account in transfer. Specifically, we propose to use low-dimensional manifold to represent subdomain, and align the local data distribution discrepancy in each manifold across domains. A Manifold Maximum Mean Discrepancy (M3D) is developed to measure the local distribution discrepancy in each manifold. We then propose a general framework, called Transfer with Manifolds Discrepancy Alignment (TMDA), to couple the discovery of data manifolds with the minimization of M3D. We instantiate TMDA in the subspace learning case considering both the linear and nonlinear mappings. We also instantiate TMDA in the deep learning framework. Extensive experimental studies demonstrate that TMDA is a promising method for various transfer learning tasks.
MLApr 24, 2020
Causal Modeling with Stochastic ConfoundersThanh Vinh Vo, Pengfei Wei, Wicher Bergsma et al.
This work extends causal inference with stochastic confounders. We propose a new approach to variational estimation for causal inference based on a representer theorem with a random input space. We estimate causal effects involving latent confounders that may be interdependent and time-varying from sequential, repeated measurements in an observational study. Our approach extends current work that assumes independent, non-temporal latent confounders, with potentially biased estimators. We introduce a simple yet elegant algorithm without parametric specification on model components. Our method avoids the need for expensive and careful parameterization in deploying complex models, such as deep neural networks, for causal inference in existing approaches. We demonstrate the effectiveness of our approach on various benchmark temporal datasets.
LGDec 3, 2018
Knowledge-driven generative subspaces for modeling multi-view dependencies in medical dataParvathy Sudhir Pillai, Tze-Yun Leong
Early detection of Alzheimer's disease (AD) and identification of potential risk/beneficial factors are important for planning and administering timely interventions or preventive measures. In this paper, we learn a disease model for AD that combines genotypic and phenotypic profiles, and cognitive health metrics of patients. We propose a probabilistic generative subspace that describes the correlative, complementary and domain-specific semantics of the dependencies in multi-view, multi-modality medical data. Guided by domain knowledge and using the latent consensus between abstractions of multi-view data, we model the fusion as a data generating process. We show that our approach can potentially lead to i) explainable clinical predictions and ii) improved AD diagnoses.
AIMar 20, 2013
Representation Requirements for Supporting Decision Model FormulationTze-Yun Leong
This paper outlines a methodology for analyzing the representational support for knowledge-based decision-modeling in a broad domain. A relevant set of inference patterns and knowledge types are identified. By comparing the analysis results to existing representations, some insights are gained into a design approach for integrating categorical and uncertain knowledge in a context sensitive manner.
AIMar 13, 2013
Representing Context-Sensitive Knowledge in a Network Formalism: A Preliminary ReportTze-Yun Leong
Automated decision making is often complicated by the complexity of the knowledge involved. Much of this complexity arises from the context sensitive variations of the underlying phenomena. We propose a framework for representing descriptive, context-sensitive knowledge. Our approach attempts to integrate categorical and uncertain knowledge in a network formalism. This paper outlines the basic representation constructs, examines their expressiveness and efficiency, and discusses the potential applications of the framework.
AIJan 16, 2013
Causal Mechanism-based Model ConstructionTsai-Ching Lu, Marek J. Druzdzel, Tze-Yun Leong
We propose a framework for building graphical causal model that is based on the concept of causal mechanisms. Causal models are intuitive for human users and, more importantly, support the prediction of the effect of manipulation. We describe an implementation of the proposed framework as an interactive model construction module, ImaGeNIe, in SMILE (Structural Modeling, Inference, and Learning Engine) and in GeNIe (SMILE's Windows user interface).
AIJun 26, 2012
Bootstrapping Monte Carlo Tree Search with an Imperfect HeuristicTruong-Huy Dinh Nguyen, Wee-Sun Lee, Tze-Yun Leong
We consider the problem of using a heuristic policy to improve the value approximation by the Upper Confidence Bound applied in Trees (UCT) algorithm in non-adversarial settings such as planning with large-state space Markov Decision Processes. Current improvements to UCT focus on either changing the action selection formula at the internal nodes or the rollout policy at the leaf nodes of the search tree. In this work, we propose to add an auxiliary arm to each of the internal nodes, and always use the heuristic policy to roll out simulations at the auxiliary arms. The method aims to get fast convergence to optimal values at states where the heuristic policy is optimal, while retaining similar approximation as the original UCT in other states. We show that bootstrapping with the proposed method in the new algorithm, UCT-Aux, performs better compared to the original UCT algorithm and its variants in two benchmark experiment settings. We also examine conditions under which UCT-Aux works well.
AIJun 26, 2012
CAPIR: Collaborative Action Planning with Intention RecognitionTruong-Huy Dinh Nguyen, David Hsu, Wee-Sun Lee et al.
We apply decision theoretic techniques to construct non-player characters that are able to assist a human player in collaborative games. The method is based on solving Markov decision processes, which can be difficult when the game state is described by many variables. To scale to more complex games, the method allows decomposition of a game task into subtasks, each of which can be modelled by a Markov decision process. Intention recognition is used to infer the subtask that the human is currently performing, allowing the helper to assist the human in performing the correct task. Experiments show that the method can be effective, giving near-human level performance in helping a human in a collaborative game.