LGApr 17, 2022
Learning Theory of Mind via Dynamic Traits AttributionDung Nguyen, Phuoc Nguyen, Hung Le et al.
Machine learning of Theory of Mind (ToM) is essential to build social agents that co-live with humans and other agents. This capacity, once acquired, will help machines infer the mental states of others from observed contextual action trajectories, enabling future prediction of goals, intention, actions and successor representations. The underlying mechanism for such a prediction remains unclear, however. Inspired by the observation that humans often infer the character traits of others, then use it to explain behaviour, we propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories. This trait vector then multiplicatively modulates the prediction mechanism via a `fast weights' scheme in the prediction neural network, which reads the current context and predicts the behaviour. We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability. On the indirect assessment of false-belief understanding, the new ToM model enables more efficient helping behaviours.
LGApr 17, 2022
Learning to Transfer Role Assignment Across Team SizesDung Nguyen, Phuoc Nguyen, Svetha Venkatesh et al.
Multi-agent reinforcement learning holds the key for solving complex tasks that demand the coordination of learning agents. However, strong coordination often leads to expensive exploration over the exponentially large state-action space. A powerful approach is to decompose team works into roles, which are ideally assigned to agents with the relevant skills. Training agents to adaptively choose and play emerging roles in a team thus allows the team to scale to complex tasks and quickly adapt to changing environments. These promises, however, have not been fully realised by current role-based multi-agent reinforcement learning methods as they assume either a pre-defined role structure or a fixed team size. We propose a framework to learn role assignment and transfer across team sizes. In particular, we train a role assignment network for small teams by demonstration and transfer the network to larger teams, which continue to learn through interaction with the environment. We demonstrate that re-using the role-based credit assignment structure can foster the learning process of larger reinforcement learning teams to achieve tasks requiring different roles. Our proposal outperforms competing techniques in enriched role-enforcing Prey-Predator games and in new scenarios in the StarCraft II Micro-Management benchmark.
LGMay 13, 2022
Fast Conditional Network Compression Using Bayesian HyperNetworksPhuoc Nguyen, Truyen Tran, Ky Le et al.
We introduce a conditional compression problem and propose a fast framework for tackling it. The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts, e.g. a context involving only a subset of classes or a context where only limited compute resource is available. To solve this, we propose an efficient Bayesian framework to compress a given large network into much smaller size tailored to meet each contextual requirement. We employ a hypernetwork to parameterize the posterior distribution of weights given conditional inputs and minimize a variational objective of this Bayesian neural network. To further reduce the network sizes, we propose a new input-output group sparsity factorization of weights to encourage more sparseness in the generated weights. Our methods can quickly generate compressed networks with significantly smaller sizes than baseline methods.
AIJan 17, 2023
Memory-Augmented Theory of Mind NetworkDung Nguyen, Phuoc Nguyen, Hung Le et al.
Social reasoning necessitates the capacity of theory of mind (ToM), the ability to contextualise and attribute mental states to others without having access to their internal cognitive structure. Recent machine learning approaches to ToM have demonstrated that we can train the observer to read the past and present behaviours of other agents and infer their beliefs (including false beliefs about things that no longer exist), goals, intentions and future actions. The challenges arise when the behavioural space is complex, demanding skilful space navigation for rapidly changing contexts for an extended period. We tackle the challenges by equipping the observer with novel neural memory mechanisms to encode, and hierarchical attention to selectively retrieve information about others. The memories allow rapid, selective querying of distal related past behaviours of others to deliberatively reason about their current mental state, beliefs and future behaviours. This results in ToMMY, a theory of mind model that learns to reason while making little assumptions about the underlying mental processes. We also construct a new suite of experiments to demonstrate that memories facilitate the learning process and achieve better theory of mind performance, especially for high-demand false-belief tasks that require inferring through multiple steps of changes.
LGJan 29
Score-based Integrated Gradient for Root Cause Explanations of OutliersPhuoc Nguyen, Truyen Tran, Sunil Gupta et al.
Identifying the root causes of outliers is a fundamental problem in causal inference and anomaly detection. Traditional approaches based on heuristics or counterfactual reasoning often struggle under uncertainty and high-dimensional dependencies. We introduce SIREN, a novel and scalable method that attributes the root causes of outliers by estimating the score functions of the data likelihood. Attribution is computed via integrated gradients that accumulate score contributions along paths from the outlier toward the normal data distribution. Our method satisfies three of the four classic Shapley value axioms - dummy, efficiency, and linearity - as well as an asymmetry axiom derived from the underlying causal structure. Unlike prior work, SIREN operates directly on the score function, enabling tractable and uncertainty-aware root cause attribution in nonlinear, high-dimensional, and heteroscedastic causal models. Extensive experiments on synthetic random graphs and real-world cloud service and supply chain datasets show that SIREN outperforms state-of-the-art baselines in both attribution accuracy and computational efficiency.
LGJan 28
Robust SDE Parameter Estimation Under Missing Time Information SettingLong Van Tran, Truyen Tran, Phuoc Nguyen
Recent advances in stochastic differential equations (SDEs) have enabled robust modeling of real-world dynamical processes across diverse domains, such as finance, health, and systems biology. However, parameter estimation for SDEs typically relies on accurately timestamped observational sequences. When temporal ordering information is corrupted, missing, or deliberately hidden (e.g., for privacy), existing estimation methods often fail. In this paper, we investigate the conditions under which temporal order can be recovered and introduce a novel framework that simultaneously reconstructs temporal information and estimates SDE parameters. Our approach exploits asymmetries between forward and backward processes, deriving a score-matching criterion to infer the correct temporal order between pairs of observations. We then recover the total order via a sorting procedure and estimate SDE parameters from the reconstructed sequence using maximum likelihood. Finally, we conduct extensive experiments on synthetic and real-world datasets to demonstrate the effectiveness of our method, extending parameter estimation to settings with missing temporal order and broadening applicability in sensitive domains.
CLMay 7, 2024
Evaluating Text Summaries Generated by Large Language Models Using OpenAI's GPTHassan Shakil, Atqiya Munawara Mahi, Phuoc Nguyen et al.
This research examines the effectiveness of OpenAI's GPT models as independent evaluators of text summaries generated by six transformer-based models from Hugging Face: DistilBART, BERT, ProphetNet, T5, BART, and PEGASUS. We evaluated these summaries based on essential properties of high-quality summary - conciseness, relevance, coherence, and readability - using traditional metrics such as ROUGE and Latent Semantic Analysis (LSA). Uniquely, we also employed GPT not as a summarizer but as an evaluator, allowing it to independently assess summary quality without predefined metrics. Our analysis revealed significant correlations between GPT evaluations and traditional metrics, particularly in assessing relevance and coherence. The results demonstrate GPT's potential as a robust tool for evaluating text summaries, offering insights that complement established metrics and providing a basis for comparative analysis of transformer-based models in natural language processing tasks.
MLDec 15, 2023
Robust Estimation of Causal Heteroscedastic Noise ModelsQuang-Duy Tran, Bao Duong, Phuoc Nguyen et al.
Distinguishing the cause and effect from bivariate observational data is the foundational problem that finds applications in many scientific disciplines. One solution to this problem is assuming that cause and effect are generated from a structural causal model, enabling identification of the causal direction after estimating the model in each direction. The heteroscedastic noise model is a type of structural causal model where the cause can contribute to both the mean and variance of the noise. Current methods for estimating heteroscedastic noise models choose the Gaussian likelihood as the optimization objective which can be suboptimal and unstable when the data has a non-Gaussian distribution. To address this limitation, we propose a novel approach to estimating this model with Student's $t$-distribution, which is known for its robustness in accounting for sampling variability with smaller sample sizes and extreme values without significantly altering the overall distribution shape. This adaptability is beneficial for capturing the parameters of the noise distribution in heteroscedastic noise models. Our empirical evaluations demonstrate that our estimators are more robust and achieve better overall performance across synthetic and real benchmarks.
AIDec 19, 2023
Root Cause Explanation of Outliers under Noisy MechanismsPhuoc Nguyen, Truyen Tran, Sunil Gupta et al.
Identifying root causes of anomalies in causal processes is vital across disciplines. Once identified, one can isolate the root causes and implement necessary measures to restore the normal operation. Causal processes are often modelled as graphs with entities being nodes and their paths/interconnections as edge. Existing work only consider the contribution of nodes in the generative process, thus can not attribute the outlier score to the edges of the mechanism if the anomaly occurs in the connections. In this paper, we consider both individual edge and node of each mechanism when identifying the root causes. We introduce a noisy functional causal model to account for this purpose. Then, we employ Bayesian learning and inference methods to infer the noises of the nodes and edges. We then represent the functional form of a target outlier leaf as a function of the node and edge noises. Finally, we propose an efficient gradient-based attribution method to compute the anomaly attribution scores which scales linearly with the number of nodes and edges. Experiments on simulated datasets and two real-world scenario datasets show better anomaly attribution performance of the proposed method compared to the baselines. Our method scales to larger graphs with more nodes and edges.
LGMay 12, 2025
Identifying Causal Direction via Variational Bayesian CompressionQuang-Duy Tran, Bao Duong, Phuoc Nguyen et al.
Telling apart the cause and effect between two random variables with purely observational data is a challenging problem that finds applications in various scientific disciplines. A key principle utilized in this task is the algorithmic Markov condition, which postulates that the joint distribution, when factorized according to the causal direction, yields a more succinct codelength compared to the anti-causal direction. Previous approaches approximate these codelengths by relying on simple functions or Gaussian processes (GPs) with easily evaluable complexity, compromising between model fitness and computational complexity. To address these limitations, we propose leveraging the variational Bayesian learning of neural networks as an interpretation of the codelengths. This allows the improvement of model fitness, while maintaining the succinctness of the codelengths, and the avoidance of the significant computational complexity of the GP-based approaches. Extensive experiments on both synthetic and real-world benchmarks in cause-effect identification demonstrate the effectiveness of our proposed method, showing promising performance enhancements on several datasets in comparison to most related methods.
LGSep 4, 2023
Differentiable Bayesian Structure Learning with Acyclicity AssuranceQuang-Duy Tran, Phuoc Nguyen, Bao Duong et al.
Score-based approaches in the structure learning task are thriving because of their scalability. Continuous relaxation has been the key reason for this advancement. Despite achieving promising outcomes, most of these methods are still struggling to ensure that the graphs generated from the latent space are acyclic by minimizing a defined score. There has also been another trend of permutation-based approaches, which concern the search for the topological ordering of the variables in the directed acyclic graph in order to limit the search space of the graph. In this study, we propose an alternative approach for strictly constraining the acyclicty of the graphs with an integration of the knowledge from the topological orderings. Our approach can reduce inference complexity while ensuring the structures of the generated graphs to be acyclic. Our empirical experiments with simulated and real-world data show that our approach can outperform related Bayesian score-based approaches.
LGSep 20, 2020
Unsupervised Anomaly Detection on Temporal Multiway DataDuc Nguyen, Phuoc Nguyen, Kien Do et al.
Temporal anomaly detection looks for irregularities over space-time. Unsupervised temporal models employed thus far typically work on sequences of feature vectors, and much less on temporal multiway data. We focus our investigation on two-way data, in which a data matrix is observed at each time step. Leveraging recent advances in matrix-native recurrent neural networks, we investigated strategies for data arrangement and unsupervised training for temporal multiway anomaly detection. These include compressing-decompressing, encoding-predicting, and temporal data differencing. We conducted a comprehensive suite of experiments to evaluate model behaviors under various settings on synthetic data, moving digits, and ECG recordings. We found interesting phenomena not previously reported. These include the capacity of the compact matrix LSTM to compress noisy data near perfectly, making the strategy of compressing-decompressing data ill-suited for anomaly detection under the noise. Also, long sequence of vectors can be addressed directly by matrix models that allow very long context and multiple step prediction. Overall, the encoding-predicting strategy works very well for the matrix LSTMs in the conducted experiments, thanks to its compactness and better fit to the data dynamics.
AISep 16, 2020
Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement LearningDung Nguyen, Svetha Venkatesh, Phuoc Nguyen et al.
Guilt aversion induces experience of a utility loss in people if they believe they have disappointed others, and this promotes cooperative behaviour in human. In psychological game theory, guilt aversion necessitates modelling of agents that have theory about what other agents think, also known as Theory of Mind (ToM). We aim to build a new kind of affective reinforcement learning agents, called Theory of Mind Agents with Guilt Aversion (ToMAGA), which are equipped with an ability to think about the wellbeing of others instead of just self-interest. To validate the agent design, we use a general-sum game known as Stag Hunt as a test bed. As standard reinforcement learning agents could learn suboptimal policies in social dilemmas like Stag Hunt, we propose to use belief-based guilt aversion as a reward shaping mechanism. We show that our belief-based guilt averse agents can efficiently learn cooperative behaviours in Stag Hunt Games.
MLMay 18, 2020
Variational Hyper-Encoding NetworksPhuoc Nguyen, Truyen Tran, Sunil Gupta et al.
We propose a framework called HyperVAE for encoding distributions of distributions. When a target distribution is modeled by a VAE, its neural network parameters θis drawn from a distribution p(θ) which is modeled by a hyper-level VAE. We propose a variational inference using Gaussian mixture models to implicitly encode the parameters θinto a low dimensional Gaussian distribution. Given a target distribution, we predict the posterior distribution of the latent code, then use a matrix-network decoder to generate a posterior distribution q(θ). HyperVAE can encode the parameters θin full in contrast to common hyper-networks practices, which generate only the scale and bias vectors as target-network parameters. Thus HyperVAE preserves much more information about the model for each task in the latent space. We discuss HyperVAE using the minimum description length (MDL) principle and show that it helps HyperVAE to generalize. We evaluate HyperVAE in density estimation tasks, outlier detection and discovery of novel design classes, demonstrating its efficacy.
MLOct 31, 2018
Hybrid Generative-Discriminative Models for Inverse Materials DesignPhuoc Nguyen, Truyen Tran, Sunil Gupta et al.
Discovering new physical products and processes often demands enormous experimentation and expensive simulation. To design a new product with certain target characteristics, an extensive search is performed in the design space by trying out a large number of design combinations before reaching to the target characteristics. However, forward searching for the target design becomes prohibitive when the target is itself moving or only partially understood. To address this bottleneck, we propose to use backward prediction by leveraging the rich data generated during earlier exploration and construct a machine learning framework to predict the design parameters for any target in a single step. This poses two technical challenges: the first caused due to one-to-many mapping when learning the inverse problem and the second caused due to an user specifying the target specifications only partially. To overcome the challenges, we formulate this problem as conditional density estimation under high-dimensional setting with incomplete input and multimodal output. We solve the problem through a deep hybrid generative-discriminative model, which is trained end-to-end to predict the optimum design.
NEFeb 3, 2018
Resset: A Recurrent Model for Sequence of Sets with Applications to Electronic Medical RecordsPhuoc Nguyen, Truyen Tran, Svetha Venkatesh
Modern healthcare is ripe for disruption by AI. A game changer would be automatic understanding the latent processes from electronic medical records, which are being collected for billions of people worldwide. However, these healthcare processes are complicated by the interaction between at least three dynamic components: the illness which involves multiple diseases, the care which involves multiple treatments, and the recording practice which is biased and erroneous. Existing methods are inadequate in capturing the dynamic structure of care. We propose Resset, an end-to-end recurrent model that reads medical record and predicts future risk. The model adopts the algebraic view in that discrete medical objects are embedded into continuous vectors lying in the same space. We formulate the problem as modeling sequences of sets, a novel setting that have rarely, if not, been addressed. Within Resset, the bag of diseases recorded at each clinic visit is modeled as function of sets. The same hold for the bag of treatments. The interaction between the disease bag and the treatment bag at a visit is modeled in several, one of which as residual of diseases minus the treatments. Finally, the health trajectory, which is a sequence of visits, is modeled using a recurrent neural network. We report results on over a hundred thousand hospital visits by patients suffered from two costly chronic diseases -- diabetes and mental health. Resset shows promises in multiple predictive tasks such as readmission prediction, treatments recommendation and diseases progression.
AIDec 22, 2017
Intelligent Device Discovery in the Internet of Things - Enabling the Robot SocietyJames Sunthonlap, Phuoc Nguyen, Zilong Ye
The Internet of Things (IoT) is continuously growing to connect billions of smart devices anywhere and anytime in an Internet-like structure, which enables a variety of applications, services and interactions between human and objects. In the future, the smart devices are supposed to be able to autonomously discover a target device with desired features and generate a set of entirely new services and applications that are not supervised or even imagined by human beings. The pervasiveness of smart devices, as well as the heterogeneity of their design and functionalities, raise a major concern: How can a smart device efficiently discover a desired target device? In this paper, we propose a Social-Aware and Distributed (SAND) scheme that achieves a fast, scalable and efficient device discovery in the IoT. The proposed SAND scheme adopts a novel device ranking criteria that measures the device's degree, social relationship diversity, clustering coefficient and betweenness. Based on the device ranking criteria, the discovery request can be guided to travel through critical devices that stand at the major intersections of the network, and thus quickly reach the desired target device by contacting only a limited number of intermediate devices. With the help of such an intelligent device discovery as SAND, the IoT devices, as well as other computing facilities, software and data on the Internet, can autonomously establish new social connections with each other as human being do. They can formulate self-organized computing groups to perform required computing tasks, facilitate a fusion of a variety of computing service, network service and data to generate novel applications and services, evolve from the individual aritificial intelligence to the collaborative intelligence, and eventually enable the birth of a robot society.
LGNov 21, 2017
Finding Algebraic Structure of Care in Time: A Deep Learning ApproachPhuoc Nguyen, Truyen Tran, Svetha Venkatesh
Understanding the latent processes from Electronic Medical Records could be a game changer in modern healthcare. However, the processes are complex due to the interaction between at least three dynamic components: the illness, the care and the recording practice. Existing methods are inadequate in capturing the dynamic structure of care. We propose an end-to-end model that reads medical record and predicts future risk. The model adopts the algebraic view in that discrete medical objects are embedded into continuous vectors lying in the same space. The bag of disease and comorbidities recorded at each hospital visit are modeled as function of sets. The same holds for the bag of treatments. The interaction between diseases and treatments at a visit is modeled as the residual of the diseases minus the treatments. Finally, the health trajectory, which is a sequence of visits, is modeled using a recurrent neural network. We report preliminary results on chronic diseases - diabetes and mental health - for predicting unplanned readmission.
LGJul 17, 2017
Deep Learning to Attend to Risk in ICUPhuoc Nguyen, Truyen Tran, Svetha Venkatesh
Modeling physiological time-series in ICU is of high clinical importance. However, data collected within ICU are irregular in time and often contain missing measurements. Since absence of a measure would signify its lack of importance, the missingness is indeed informative and might reflect the decision making by the clinician. Here we propose a deep learning architecture that can effectively handle these challenges for predicting ICU mortality outcomes. The model is based on Long Short-Term Memory, and has layered attention mechanisms. At the sensing layer, the model decides whether to observe and incorporate parts of the current measurements. At the reasoning layer, evidences across time steps are weighted and combined. The model is evaluated on the PhysioNet 2012 dataset showing competitive and interpretable results.
MLJul 26, 2016
Deepr: A Convolutional Net for Medical RecordsPhuoc Nguyen, Truyen Tran, Nilmini Wickramasinghe et al.
Feature engineering remains a major bottleneck when creating predictive systems from electronic medical records. At present, an important missing element is detecting predictive regular clinical motifs from irregular episodic records. We present Deepr (short for Deep record), a new end-to-end deep learning system that learns to extract features from medical records and predicts future risk automatically. Deepr transforms a record into a sequence of discrete elements separated by coded time gaps and hospital transfers. On top of the sequence is a convolutional neural net that detects and combines predictive local clinical motifs to stratify the risk. Deepr permits transparent inspection and visualization of its inner working. We validate Deepr on hospital data to predict unplanned readmission after discharge. Deepr achieves superior accuracy compared to traditional techniques, detects meaningful clinical motifs, and uncovers the underlying structure of the disease and intervention space.