CVDec 6, 2022
Causal Inference via Style Transfer for Out-of-distribution GeneralisationToan Nguyen, Kien Do, Duc Thanh Nguyen et al.
Out-of-distribution (OOD) generalisation aims to build a model that can generalise well on an unseen target domain using knowledge from multiple source domains. To this end, the model should seek the causal dependence between inputs and labels, which may be determined by the semantics of inputs and remain invariant across domains. However, statistical or non-causal methods often cannot capture this dependence and perform poorly due to not considering spurious correlations learnt from model training via unobserved confounders. A well-known existing causal inference method like back-door adjustment cannot be applied to remove spurious correlations as it requires the observation of confounders. In this paper, we propose a novel method that effectively deals with hidden confounders by successfully implementing front-door adjustment (FA). FA requires the choice of a mediator, which we regard as the semantic information of images that helps access the causal mechanism without the need for observing confounders. Further, we propose to estimate the combination of the mediator with other observed images in the front-door formula via style transfer algorithms. Our use of style transfer to estimate FA is novel and sensible for OOD generalisation, which we justify by extensive experimental results on widely used benchmark datasets.
LGJul 25, 2022Code
Efficient Classification with Counterfactual Reasoning and Active LearningAzhar Mohammed, Dang Nguyen, Bao Duong et al.
Data augmentation is one of the most successful techniques to improve the classification accuracy of machine learning models in computer vision. However, applying data augmentation to tabular data is a challenging problem since it is hard to generate synthetic samples with labels. In this paper, we propose an efficient classifier with a novel data augmentation technique for tabular data. Our method called CCRAL combines causal reasoning to learn counterfactual samples for the original training samples and active learning to select useful counterfactual samples based on a region of uncertainty. By doing this, our method can maximize our model's generalization on the unseen testing data. We validate our method analytically, and compare with the standard baselines. Our experimental results highlight that CCRAL achieves significantly better performance than those of the baselines across several real-world tabular datasets in terms of accuracy and AUC. Data and source code are available at: https://github.com/nphdang/CCRAL.
LGAug 24, 2024
Reinforcement Learning for Causal Discovery without Acyclicity ConstraintsBao Duong, Hung Le, Biwei Huang et al.
Recently, reinforcement learning (RL) has proved a promising alternative for conventional local heuristics in score-based approaches to learning directed acyclic causal graphs (DAGs) from observational data. However, the intricate acyclicity constraint still challenges the efficient exploration of the vast space of DAGs in existing methods. In this study, we introduce ALIAS (reinforced dAg Learning wIthout Acyclicity conStraints), a novel approach to causal discovery powered by the RL machinery. Our method features an efficient policy for generating DAGs in just a single step with an optimal quadratic complexity, fueled by a novel parametrization of DAGs that directly translates a continuous space to the space of all DAGs, bypassing the need for explicitly enforcing acyclicity constraints. This approach enables us to navigate the search space more effectively by utilizing policy gradient methods and established scoring functions. In addition, we provide compelling empirical evidence for the strong performance of ALIAS in comparison with state-of-the-arts in causal discovery over increasingly difficult experiment conditions on both synthetic and real datasets.
LGOct 28, 2023
Domain Generalisation via Risk Distribution MatchingToan Nguyen, Kien Do, Bao Duong et al.
We propose a novel approach for domain generalisation (DG) leveraging risk distributions to characterise domains, thereby achieving domain invariance. In our findings, risk distributions effectively highlight differences between training domains and reveal their inherent complexities. In testing, we may observe similar, or potentially intensifying in magnitude, divergences between risk distributions. Hence, we propose a compelling proposition: Minimising the divergences between risk distributions across training domains leads to robust invariance for DG. The key rationale behind this concept is that a model, trained on domain-invariant or stable features, may consistently produce similar risk distributions across various domains. Building upon this idea, we propose Risk Distribution Matching (RDM). Using the maximum mean discrepancy (MMD) distance, RDM aims to minimise the variance of risk distributions across training domains. However, when the number of domains increases, the direct optimisation of variance leads to linear growth in MMD computations, resulting in inefficiency. Instead, we propose an approximation that requires only one MMD computation, by aligning just two distributions: that of the worst-case domain and the aggregated distribution from all domains. Notably, this method empirically outperforms optimising distributional variance while being computationally more efficient. Unlike conventional DG matching algorithms, RDM stands out for its enhanced efficacy by concentrating on scalar risk distributions, sidestepping the pitfalls of high-dimensional challenges seen in feature or gradient matching. Our extensive experiments on standard benchmark datasets demonstrate that RDM shows superior generalisation capability over state-of-the-art DG methods.
LGNov 20, 2022
Diffeomorphic Information Neural EstimationBao Duong, Thin Nguyen
Mutual Information (MI) and Conditional Mutual Information (CMI) are multi-purpose tools from information theory that are able to naturally measure the statistical dependencies between random variables, thus they are usually of central interest in several statistical and machine learning tasks, such as conditional independence testing and representation learning. However, estimating CMI, or even MI, is infamously challenging due the intractable formulation. In this study, we introduce DINE (Diffeomorphic Information Neural Estimator)-a novel approach for estimating CMI of continuous random variables, inspired by the invariance of CMI over diffeomorphic maps. We show that the variables of interest can be replaced with appropriate surrogates that follow simpler distributions, allowing the CMI to be efficiently evaluated via analytical solutions. Additionally, we demonstrate the quality of the proposed estimator in comparison with state-of-the-arts in three important tasks, including estimating MI, CMI, as well as its application in conditional independence testing. The empirical evaluations show that DINE consistently outperforms competitors in all tasks and is able to adapt very well to complex and high-dimensional relationships.
LGSep 4, 2022
Conditional Independence Testing via Latent Representation LearningBao Duong, Thin Nguyen
Detecting conditional independencies plays a key role in several statistical and machine learning tasks, especially in causal discovery algorithms. In this study, we introduce LCIT (Latent representation based Conditional Independence Test)-a novel non-parametric method for conditional independence testing based on representation learning. Our main contribution involves proposing a generative framework in which to test for the independence between X and Y given Z, we first learn to infer the latent representations of target variables X and Y that contain no information about the conditioning variable Z. The latent variables are then investigated for any significant remaining dependencies, which can be performed using the conventional partial correlation test. The empirical evaluations show that LCIT outperforms several state-of-the-art baselines consistently under different evaluation metrics, and is able to adapt really well to both non-linear and high-dimensional settings on a diverse collection of synthetic and real data sets.
LGJul 6, 2024
Scalable Variational Causal Discovery Unconstrained by AcyclicityNu Hoang, Bao Duong, Thin Nguyen
Bayesian causal discovery offers the power to quantify epistemic uncertainties among a broad range of structurally diverse causal theories potentially explaining the data, represented in forms of directed acyclic graphs (DAGs). However, existing methods struggle with efficient DAG sampling due to the complex acyclicity constraint. In this study, we propose a scalable Bayesian approach to effectively learn the posterior distribution over causal graphs given observational data thanks to the ability to generate DAGs without explicitly enforcing acyclicity. Specifically, we introduce a novel differentiable DAG sampling method that can generate a valid acyclic causal graph by mapping an unconstrained distribution of implicit topological orders to a distribution over DAGs. Given this efficient DAG sampling scheme, we are able to model the posterior distribution over causal graphs using a simple variational distribution over a continuous domain, which can be learned via the variational inference framework. Extensive empirical experiments on both simulated and real datasets demonstrate the superior performance of the proposed model compared to several state-of-the-art baselines.
LGJul 16, 2023
Heteroscedastic Causal Structure LearningBao Duong, Thin Nguyen
Heretofore, learning the directed acyclic graphs (DAGs) that encode the cause-effect relationships embedded in observational data is a computationally challenging problem. A recent trend of studies has shown that it is possible to recover the DAGs with polynomial time complexity under the equal variances assumption. However, this prohibits the heteroscedasticity of the noise, which allows for more flexible modeling capabilities, but at the same time is substantially more challenging to handle. In this study, we tackle the heteroscedastic causal structure learning problem under Gaussian noises. By exploiting the normality of the causal mechanisms, we can recover a valid causal ordering, which can uniquely identify the causal DAG using a series of conditional independence tests. The result is HOST (Heteroscedastic causal STructure learning), a simple yet effective causal structure learning algorithm that scales polynomially in both sample size and dimensionality. In addition, via extensive empirical evaluations on a wide range of both controlled and real datasets, we show that the proposed HOST method is competitive with state-of-the-art approaches in both the causal order learning and structure learning problems.
CVMar 4, 2025Code
h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-TransformToan Nguyen, Kien Do, Duc Kieu et al.
We introduce a theoretical framework for diffusion-based image editing by formulating it as a reverse-time bridge modeling problem. This approach modifies the backward process of a pretrained diffusion model to construct a bridge that converges to an implicit distribution associated with the editing target at time 0. Building on this framework, we propose h-Edit, a novel editing method that utilizes Doob's h-transform and Langevin Monte Carlo to decompose the update of an intermediate edited sample into two components: a "reconstruction" term and an "editing" term. This decomposition provides flexibility, allowing the reconstruction term to be computed via existing inversion techniques and enabling the combination of multiple editing terms to handle complex editing tasks. To our knowledge, h-Edit is the first training-free method capable of performing simultaneous text-guided and reward-model-based editing. Extensive experiments, both quantitative and qualitative, show that h-Edit outperforms state-of-the-art baselines in terms of editing effectiveness and faithfulness. Our source code is available at https://github.com/nktoan/h-edit.
LGMar 21
Neural Autoregressive Flows for Markov Boundary LearningKhoa Nguyen, Bao Duong, Viet Huynh et al.
Recovering Markov boundary -- the minimal set of variables that maximizes predictive performance for a response variable -- is crucial in many applications. While recent advances improve upon traditional constraint-based techniques by scoring local causal structures, they still rely on nonparametric estimators and heuristic searches, lacking theoretical guarantees for reliability. This paper investigates a framework for efficient Markov boundary discovery by integrating conditional entropy from information theory as a scoring criterion. We design a novel masked autoregressive network to capture complex dependencies. A parallelizable greedy search strategy in polynomial time is proposed, supported by analytical evidence. We also discuss how initializing a graph with learned Markov boundaries accelerates the convergence of causal discovery. Comprehensive evaluations on real-world and synthetic datasets demonstrate the scalability and superior performance of our method in both Markov boundary discovery and causal discovery tasks.
CVFeb 12, 2025Code
Bidirectional Diffusion Bridge ModelsDuc Kieu, Kien Do, Toan Nguyen et al.
Diffusion bridges have shown potential in paired image-to-image (I2I) translation tasks. However, existing methods are limited by their unidirectional nature, requiring separate models for forward and reverse translations. This not only doubles the computational cost but also restricts their practicality. In this work, we introduce the Bidirectional Diffusion Bridge Model (BDBM), a scalable approach that facilitates bidirectional translation between two coupled distributions using a single network. BDBM leverages the Chapman-Kolmogorov Equation for bridges, enabling it to model data distribution shifts across timesteps in both forward and backward directions by exploiting the interchangeability of the initial and target timesteps within this framework. Notably, when the marginal distribution given endpoints is Gaussian, BDBM's transition kernels in both directions possess analytical forms, allowing for efficient learning with a single network. We demonstrate the connection between BDBM and existing bridge methods, such as Doob's h-transform and variational approaches, and highlight its advantages. Extensive experiments on high-resolution I2I translation tasks demonstrate that BDBM not only enables bidirectional translation with minimal additional cost but also outperforms state-of-the-art bridge models. Our source code is available at [https://github.com/kvmduc/BDBM||https://github.com/kvmduc/BDBM].
LGFeb 5, 2024Code
Variational Flow Models: Flowing in Your StyleKien Do, Duc Kieu, Toan Nguyen et al.
We propose a systematic training-free method to transform the probability flow of a "linear" stochastic process characterized by the equation X_{t}=a_{t}X_{0}+σ_{t}X_{1} into a straight constant-speed (SC) flow, reminiscent of Rectified Flow. This transformation facilitates fast sampling along the original probability flow via the Euler method without training a new model of the SC flow. The flexibility of our approach allows us to extend our transformation to inter-convert two posterior flows of two distinct linear stochastic processes. Moreover, we can easily integrate high-order numerical solvers into the transformed SC flow, further enhancing the sampling accuracy and efficiency. Rigorous theoretical analysis and extensive experimental results substantiate the advantages of our framework. Our code is available at this [https://github.com/clarken92/VFM||link].
AIAug 19, 2025Code
CausalPlan: Empowering Efficient LLM Multi-Agent Collaboration Through Causality-Driven PlanningMinh Hoang Nguyen, Van Dai Do, Dung Nguyen et al.
Large language model (LLM) agents-especially smaller, open-source models-often produce causally invalid or incoherent actions in collaborative tasks due to their reliance on surface-level correlations rather than grounded causal reasoning. This limitation undermines their performance in terms of coordination and planning in dynamic environments. We address this challenge with CausalPlan, a two-phase framework that integrates explicit structural causal reasoning into the LLM planning process. At the core of CausalPlan is the Structural Causal Action (SCA) model, which learns a causal graph from agent trajectories to capture how prior actions and current environment states influence future decisions. This structure is then used to guide action selection by assigning causal scores to LLM-generated proposals, reweighting them accordingly, or falling back to causally grounded alternatives when needed. By embedding this causal knowledge directly into the decision loop, CausalPlan constrains planning to intervention-consistent behaviours without requiring fine-tuning of the LLM itself. We evaluate CausalPlan on the Overcooked-AI benchmark across five multi-agent coordination tasks and four LLMs of varying sizes: Gemma-7B, Llama-8B, Qwen-14B, and Llama-70B. Experimental results show that CausalPlan consistently reduces invalid actions and improves collaboration in both AI-AI and human-AI settings, outperforming strong reinforcement learning baselines. Our findings highlight the value of causality-driven planning for deploying efficient, interpretable, and generalisable multi-agent LLM systems.
LGMay 12
BSO: Safety Alignment Is Density Ratio MatchingTien-Phat Nguyen, Truong Nguyen, Thin Nguyen et al.
Aligning language models for both helpfulness and safety typically requires complex pipelines-separate reward and cost models, online reinforcement learning, and primal-dual updates. Recent direct preference optimization approaches simplify training but incorporate safety through ad-hoc modifications such as multi-stage procedures or heuristic margin terms, lacking a principled derivation. We show that the likelihood ratio of the optimal safe policy admits a closed-form decomposition that reduces safety alignment to a density ratio matching problem. Minimizing Bregman divergences between the data and model ratios yields Bregman Safety Optimization (BSO), a family of single-stage loss functions, each induced by a convex generator, that provably recover the optimal safe policy. BSO is both general and simple: it requires no auxiliary models, introduces only one hyperparameter beyond standard preference optimization, and recovers existing safety-aware methods as special cases. Experiments across safety alignment benchmarks show that BSO consistently improves the safety-helpfulness trade-off.
LGJul 6, 2024
Enabling Causal Discovery in Post-Nonlinear Models with Normalizing FlowsNu Hoang, Bao Duong, Thin Nguyen
Post-nonlinear (PNL) causal models stand out as a versatile and adaptable framework for modeling intricate causal relationships. However, accurately capturing the invertibility constraint required in PNL models remains challenging in existing studies. To address this problem, we introduce CAF-PoNo (Causal discovery via Normalizing Flows for Post-Nonlinear models), harnessing the power of the normalizing flows architecture to enforce the crucial invertibility constraint in PNL models. Through normalizing flows, our method precisely reconstructs the hidden noise, which plays a vital role in cause-effect identification through statistical independence testing. Furthermore, the proposed approach exhibits remarkable extensibility, as it can be seamlessly expanded to facilitate multivariate causal discovery via causal order identification, empowering us to efficiently unravel complex causal relationships. Extensive experimental evaluations on both simulated and real datasets consistently demonstrate that the proposed method outperforms several state-of-the-art approaches in both bivariate and multivariate causal discovery tasks.
LGJan 25, 2025Code
Causal Discovery via Bayesian OptimizationBao Duong, Sunil Gupta, Thin Nguyen
Existing score-based methods for directed acyclic graph (DAG) learning from observational data struggle to recover the causal graph accurately and sample-efficiently. To overcome this, in this study, we propose DrBO (DAG recovery via Bayesian Optimization)-a novel DAG learning framework leveraging Bayesian optimization (BO) to find high-scoring DAGs. We show that, by sophisticatedly choosing the promising DAGs to explore, we can find higher-scoring ones much more efficiently. To address the scalability issues of conventional BO in DAG learning, we replace Gaussian Processes commonly employed in BO with dropout neural networks, trained in a continual manner, which allows for (i) flexibly modeling the DAG scores without overfitting, (ii) incorporation of uncertainty into the estimated scores, and (iii) scaling with the number of evaluations. As a result, DrBO is computationally efficient and can find the accurate DAG in fewer trials and less time than existing state-of-the-art methods. This is demonstrated through an extensive set of empirical evaluations on many challenging settings with both synthetic and real data. Our implementation is available at https://github.com/baosws/DrBO.
LGOct 29, 2024
Generating Realistic Tabular Data with Large Language ModelsDang Nguyen, Sunil Gupta, Kien Do et al.
While most generative models show achievements in image data generation, few are developed for tabular data generation. Recently, due to success of large language models (LLM) in diverse tasks, they have also been used for tabular data generation. However, these methods do not capture the correct correlation between the features and the target variable, hindering their applications in downstream predictive tasks. To address this problem, we propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data. First, we propose a novel permutation strategy for the input data in the fine-tuning phase. Second, we propose a feature-conditional sampling approach to generate synthetic samples. Finally, we generate the labels by constructing prompts based on the generated samples to query our fine-tuned LLM. Our extensive experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks. It also produces highly realistic synthetic samples in terms of quality and diversity. More importantly, classifiers trained with our synthetic data can even compete with classifiers trained with the original data on half of the benchmark datasets, which is a significant achievement in tabular data generation.
MLDec 15, 2023
Robust Estimation of Causal Heteroscedastic Noise ModelsQuang-Duy Tran, Bao Duong, Phuoc Nguyen et al.
Distinguishing the cause and effect from bivariate observational data is the foundational problem that finds applications in many scientific disciplines. One solution to this problem is assuming that cause and effect are generated from a structural causal model, enabling identification of the causal direction after estimating the model in each direction. The heteroscedastic noise model is a type of structural causal model where the cause can contribute to both the mean and variance of the noise. Current methods for estimating heteroscedastic noise models choose the Gaussian likelihood as the optimization objective which can be suboptimal and unstable when the data has a non-Gaussian distribution. To address this limitation, we propose a novel approach to estimating this model with Student's $t$-distribution, which is known for its robustness in accounting for sampling variability with smaller sample sizes and extreme values without significantly altering the overall distribution shape. This adaptability is beneficial for capturing the parameters of the noise distribution in heteroscedastic noise models. Our empirical evaluations demonstrate that our estimators are more robust and achieve better overall performance across synthetic and real benchmarks.
LGOct 10, 2025
Large Language Models for Imbalanced Classification: Diversity makes the differenceDang Nguyen, Sunil Gupta, Kien Do et al.
Oversampling is one of the most widely used approaches for addressing imbalanced classification. The core idea is to generate additional minority samples to rebalance the dataset. Most existing methods, such as SMOTE, require converting categorical variables into numerical vectors, which often leads to information loss. Recently, large language model (LLM)-based methods have been introduced to overcome this limitation. However, current LLM-based approaches typically generate minority samples with limited diversity, reducing robustness and generalizability in downstream classification tasks. To address this gap, we propose a novel LLM-based oversampling method designed to enhance diversity. First, we introduce a sampling strategy that conditions synthetic sample generation on both minority labels and features. Second, we develop a new permutation strategy for fine-tuning pre-trained LLMs. Third, we fine-tune the LLM not only on minority samples but also on interpolated samples to further enrich variability. Extensive experiments on 10 tabular datasets demonstrate that our method significantly outperforms eight SOTA baselines. The generated synthetic samples are both realistic and diverse. Moreover, we provide theoretical analysis through an entropy-based perspective, proving that our method encourages diversity in the generated samples.
AIDec 19, 2023
Root Cause Explanation of Outliers under Noisy MechanismsPhuoc Nguyen, Truyen Tran, Sunil Gupta et al.
Identifying root causes of anomalies in causal processes is vital across disciplines. Once identified, one can isolate the root causes and implement necessary measures to restore the normal operation. Causal processes are often modelled as graphs with entities being nodes and their paths/interconnections as edge. Existing work only consider the contribution of nodes in the generative process, thus can not attribute the outlier score to the edges of the mechanism if the anomaly occurs in the connections. In this paper, we consider both individual edge and node of each mechanism when identifying the root causes. We introduce a noisy functional causal model to account for this purpose. Then, we employ Bayesian learning and inference methods to infer the noises of the nodes and edges. We then represent the functional form of a target outlier leaf as a function of the node and edge noises. Finally, we propose an efficient gradient-based attribution method to compute the anomaly attribution scores which scales linearly with the number of nodes and edges. Experiments on simulated datasets and two real-world scenario datasets show better anomaly attribution performance of the proposed method compared to the baselines. Our method scales to larger graphs with more nodes and edges.
LGSep 26, 2025
Universal Multi-Domain Translation via Diffusion RoutersDuc Kieu, Kien Do, Tuan Hoang et al.
Multi-domain translation (MDT) aims to learn translations between multiple domains, yet existing approaches either require fully aligned tuples or can only handle domain pairs seen in training, limiting their practicality and excluding many cross-domain mappings. We introduce universal MDT (UMDT), a generalization of MDT that seeks to translate between any pair of $K$ domains using only $K-1$ paired datasets with a central domain. To tackle this problem, we propose Diffusion Router (DR), a unified diffusion-based framework that models all central$\leftrightarrow$non-central translations with a single noise predictor conditioned on the source and target domain labels. DR enables indirect non-central translations by routing through the central domain. We further introduce a novel scalable learning strategy with a variational-bound objective and an efficient Tweedie refinement procedure to support direct non-central mappings. Through evaluation on three large-scale UMDT benchmarks, DR achieves state-of-the-art results for both indirect and direct translations, while lowering sampling cost and unlocking novel tasks such as sketch$\leftrightarrow$segmentation. These results establish DR as a scalable and versatile framework for universal translation across multiple domains.
SPMay 26, 2025
Leveraging Novel Ensemble Learning Techniques and Landsat Multispectral Data for Estimating Olive Yields in TunisiaMohamed Kefi, Tien Dat Pham, Thin Nguyen et al.
Olive production is an important tree crop in Mediterranean climates. However, olive yield varies significantly due to climate change. Accurately estimating yield using remote sensing and machine learning remains a complex challenge. In this study, we developed a streamlined pipeline for olive yield estimation in the Kairouan and Sousse governorates of Tunisia. We extracted features from multispectral reflectance bands, vegetation indices derived from Landsat-8 OLI and Landsat-9 OLI-2 satellite imagery, along with digital elevation model data. These spatial features were combined with ground-based field survey data to form a structured tabular dataset. We then developed an automated ensemble learning framework, implemented using AutoGluon to train and evaluate multiple machine learning models, select optimal combinations through stacking, and generate robust yield predictions using five-fold cross-validation. The results demonstrate strong predictive performance from both sensors, with Landsat-8 OLI achieving R2 = 0.8635 and RMSE = 1.17 tons per ha, and Landsat-9 OLI-2 achieving R2 = 0.8378 and RMSE = 1.32 tons per ha. This study highlights a scalable, cost-effective, and accurate method for olive yield estimation, with potential applicability across diverse agricultural regions globally.
LGMay 12, 2025
Identifying Causal Direction via Variational Bayesian CompressionQuang-Duy Tran, Bao Duong, Phuoc Nguyen et al.
Telling apart the cause and effect between two random variables with purely observational data is a challenging problem that finds applications in various scientific disciplines. A key principle utilized in this task is the algorithmic Markov condition, which postulates that the joint distribution, when factorized according to the causal direction, yields a more succinct codelength compared to the anti-causal direction. Previous approaches approximate these codelengths by relying on simple functions or Gaussian processes (GPs) with easily evaluable complexity, compromising between model fitness and computational complexity. To address these limitations, we propose leveraging the variational Bayesian learning of neural networks as an interpretation of the codelengths. This allows the improvement of model fitness, while maintaining the succinctness of the codelengths, and the avoidance of the significant computational complexity of the GP-based approaches. Extensive experiments on both synthetic and real-world benchmarks in cause-effect identification demonstrate the effectiveness of our proposed method, showing promising performance enhancements on several datasets in comparison to most related methods.
LGMar 8, 2025
Clustering-based Meta Bayesian Optimization with Theoretical GuaranteeKhoa Nguyen, Viet Huynh, Binh Tran et al.
Bayesian Optimization (BO) is a well-established method for addressing black-box optimization problems. In many real-world scenarios, optimization often involves multiple functions, emphasizing the importance of leveraging data and learned functions from prior tasks to enhance efficiency in the current task. To expedite convergence to the global optimum, recent studies have introduced meta-learning strategies, collectively referred to as meta-BO, to incorporate knowledge from historical tasks. However, in practical settings, the underlying functions are often heterogeneous, which can adversely affect optimization performance for the current task. Additionally, when the number of historical tasks is large, meta-BO methods face significant scalability challenges. In this work, we propose a scalable and robust meta-BO method designed to address key challenges in heterogeneous and large-scale meta-tasks. Our approach (1) effectively partitions transferred meta-functions into highly homogeneous clusters, (2) learns the geometry-based surrogate prototype that capture the structural patterns within each cluster, and (3) adaptively synthesizes meta-priors during the online phase using statistical distance-based weighting policies. Experimental results on real-world hyperparameter optimization (HPO) tasks, combined with theoretical guarantees, demonstrate the robustness and effectiveness of our method in overcoming these challenges.
MLFeb 28, 2025
Amortized Conditional Independence TestingBao Duong, Nu Hoang, Thin Nguyen
Testing for the conditional independence structure in data is a fundamental and critical task in statistics and machine learning, which finds natural applications in causal discovery - a highly relevant problem to many scientific disciplines. Existing methods seek to design explicit test statistics that quantify the degree of conditional dependence, which is highly challenging yet cannot capture nor utilize prior knowledge in a data-driven manner. In this study, an entirely new approach is introduced, where we instead propose to amortize conditional independence testing and devise ACID - a novel transformer-based neural network architecture that learns to test for conditional independence. ACID can be trained on synthetic data in a supervised learning fashion, and the learned model can then be applied to any dataset of similar natures or adapted to new domains by fine-tuning with a negligible computational cost. Our extensive empirical evaluations on both synthetic and real data reveal that ACID consistently achieves state-of-the-art performance against existing baselines under multiple metrics, and is able to generalize robustly to unseen sample sizes, dimensionalities, as well as non-linearities with a remarkably low inference time.
LGSep 4, 2023
Differentiable Bayesian Structure Learning with Acyclicity AssuranceQuang-Duy Tran, Phuoc Nguyen, Bao Duong et al.
Score-based approaches in the structure learning task are thriving because of their scalability. Continuous relaxation has been the key reason for this advancement. Despite achieving promising outcomes, most of these methods are still struggling to ensure that the graphs generated from the latent space are acyclic by minimizing a defined score. There has also been another trend of permutation-based approaches, which concern the search for the topological ordering of the variables in the directed acyclic graph in order to limit the search space of the graph. In this study, we propose an alternative approach for strictly constraining the acyclicty of the graphs with an integration of the knowledge from the topological orderings. Our approach can reduce inference complexity while ensuring the structures of the generated graphs to be acyclic. Our empirical experiments with simulated and real-world data show that our approach can outperform related Bayesian score-based approaches.
AIFeb 14, 2022
Learning to Discover MedicinesTri Minh Nguyen, Thin Nguyen, Truyen Tran
Discovering new medicines is the hallmark of human endeavor to live a better and longer life. Yet the pace of discovery has slowed down as we need to venture into more wildly unexplored biomedical space to find one that matches today's high standard. Modern AI-enabled by powerful computing, large biomedical databases, and breakthroughs in deep learning-offers a new hope to break this loop as AI is rapidly maturing, ready to make a huge impact in the area. In this paper we review recent advances in AI methodologies that aim to crack this challenge. We organize the vast and rapidly growing literature of AI for drug discovery into three relatively stable sub-areas: (a) representation learning over molecular sequences and geometric graphs; (b) data-driven reasoning where we predict molecular properties and their binding, optimize existing compounds, generate de novo molecules, and plan the synthesis of target molecules; and (c) knowledge-based reasoning where we discuss the construction and reasoning over biomedical knowledge graphs. We will also identify open challenges and chart possible research directions for the years to come.
BMJan 16, 2022
Mitigating cold start problems in drug-target affinity prediction with interaction knowledge transferringTri Minh Nguyen, Thin Nguyen, Truyen Tran
Motivation: Predicting the drug-target interaction is crucial for drug discovery as well as drug repurposing. Machine learning is commonly used in drug-target affinity (DTA) problem. However, machine learning model faces the cold-start problem where the model performance drops when predicting the interaction of a novel drug or target. Previous works try to solve the cold start problem by learning the drug or target representation using unsupervised learning. While the drug or target representation can be learned in an unsupervised manner, it still lacks the interaction information, which is critical in drug-target interaction. Results: To incorporate the interaction information into the drug and protein interaction, we proposed using transfer learning from chemical-chemical interaction (CCI) and protein-protein interaction (PPI) task to drug-target interaction task. The representation learned by CCI and PPI tasks can be transferred smoothly to the DTA task due to the similar nature of the tasks. The result on the drug-target affinity datasets shows that our proposed method has advantages compared to other pretraining methods in the DTA task.
AIMar 24, 2021
Counterfactual Explanation with Multi-Agent Reinforcement Learning for Drug Target PredictionTri Minh Nguyen, Thomas P Quinn, Thin Nguyen et al.
Motivation: Many high-performance DTA models have been proposed, but they are mostly black-box and thus lack human interpretability. Explainable AI (XAI) can make DTA models more trustworthy, and can also enable scientists to distill biological knowledge from the models. Counterfactual explanation is one popular approach to explaining the behaviour of a deep neural network, which works by systematically answering the question "How would the model output change if the inputs were changed in this way?". Most counterfactual explanation methods only operate on single input data. It remains an open problem how to extend counterfactual-based XAI methods to DTA models, which have two inputs, one for drug and one for target, that also happen to be discrete in nature. Methods: We propose a multi-agent reinforcement learning framework, Multi-Agent Counterfactual Drug target binding Affinity (MACDA), to generate counterfactual explanations for the drug-protein complex. Our proposed framework provides human-interpretable counterfactual instances while optimizing both the input drug and target for counterfactual generation at the same time. Results: We benchmark the proposed MACDA framework using the Davis dataset and find that our framework produces more parsimonious explanations with no loss in explanation validity, as measured by encoding similarity and QED. We then present a case study involving ABL1 and Nilotinib to demonstrate how MACDA can explain the behaviour of a DTA model in the underlying substructure interaction between inputs in its prediction, revealing mechanisms that align with prior domain knowledge.
LGSep 25, 2020
GEFA: Early Fusion Approach in Drug-Target Affinity PredictionTri Minh Nguyen, Thin Nguyen, Thao Minh Le et al.
Predicting the interaction between a compound and a target is crucial for rapid drug repurposing. Deep learning has been successfully applied in drug-target affinity (DTA) problem. However, previous deep learning-based methods ignore modeling the direct interactions between drug and protein residues. This would lead to inaccurate learning of target representation which may change due to the drug binding effects. In addition, previous DTA methods learn protein representation solely based on a small number of protein sequences in DTA datasets while neglecting the use of proteins outside of the DTA datasets. We propose GEFA (Graph Early Fusion Affinity), a novel graph-in-graph neural network with attention mechanism to address the changes in target representation because of the binding effects. Specifically, a drug is modeled as a graph of atoms, which then serves as a node in a larger graph of residues-drug complex. The resulting model is an expressive deep nested graph neural network. We also use pre-trained protein representation powered by the recent effort of learning contextualized protein representation. The experiments are conducted under different settings to evaluate scenarios such as novel drugs or targets. The results demonstrate the effectiveness of the pre-trained protein embedding and the advantages our GEFA in modeling the nested graph for drug-target interaction.
CVApr 28, 2020
Deep Auto-Encoders with Sequential Learning for Multimodal Dimensional Emotion RecognitionDung Nguyen, Duc Thanh Nguyen, Rui Zeng et al.
Multimodal dimensional emotion recognition has drawn a great attention from the affective computing community and numerous schemes have been extensively investigated, making a significant progress in this area. However, several questions still remain unanswered for most of existing approaches including: (i) how to simultaneously learn compact yet representative features from multimodal data, (ii) how to effectively capture complementary features from multimodal streams, and (iii) how to perform all the tasks in an end-to-end manner. To address these challenges, in this paper, we propose a novel deep neural network architecture consisting of a two-stream auto-encoder and a long short term memory for effectively integrating visual and audio signal streams for emotion recognition. To validate the robustness of our proposed architecture, we carry out extensive experiments on the multimodal emotion in the wild dataset: RECOLA. Experimental results show that the proposed method achieves state-of-the-art recognition performance and surpasses existing schemes by a significant margin.
CLJul 26, 2018
Variational Memory Encoder-DecoderHung Le, Truyen Tran, Thin Nguyen et al.
Introducing variability while maintaining coherence is a core task in learning to generate utterances in conversation. Standard neural encoder-decoder models and their extensions using conditional variational autoencoder often result in either trivial or digressive responses. To overcome this, we explore a novel approach that injects variability into neural encoder-decoder via the use of external memory as a mixture model, namely Variational Memory Encoder-Decoder (VMED). By associating each memory read with a mode in the latent mixture distribution at each timestep, our model can capture the variability observed in sequential data such as natural conversations. We empirically compare the proposed model against other recent approaches on various conversational datasets. The results show that VMED consistently achieves significant improvement over others in both metric-based and qualitative evaluations.
LGApr 1, 2018
Attentional Multilabel Learning over Graphs: A Message Passing ApproachKien Do, Truyen Tran, Thin Nguyen et al.
We address a largely open problem of multilabel classification over graphs. Unlike traditional vector input, a graph has rich variable-size substructures which are related to the labels in some ways. We believe that uncovering these relations might hold the key to classification performance and explainability. We introduce GAML (Graph Attentional Multi-Label learning), a novel graph neural network that can handle this problem effectively. GAML regards labels as auxiliary nodes and models them in conjunction with the input graph. By applying message passing and attention mechanisms to both the label nodes and the input nodes iteratively, GAML can capture the relations between the labels and the input subgraphs at various resolution scales. Moreover, our model can take advantage of explicit label dependencies. It also scales linearly with the number of labels and graph size thanks to our proposed hierarchical attention. We evaluate GAML on an extensive set of experiments with both graph-structured inputs and classical unstructured inputs. The results show that GAML significantly outperforms other competing methods. Importantly, GAML enables intuitive visualizations for better understanding of the label-substructure relations and explanation of the model behaviors.