AIMay 22Code
VGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action AdaptationChanghua Xu, En Yu, Junyu Xuan et al.
Vision--Language--Action (VLA) models bridge multimodal reasoning with physical control, but adapting them to new tasks with scarce demonstrations remains unreliable. While fine-tuned VLA policies often produce semantically plausible trajectories, failures often arise from unresolved geometric ambiguities, where near-miss actions lead to divergent execution outcomes under limited supervision. We study few-shot VLA adaptation from a \emph{generation--selection} perspective and propose a novel framework \textbf{VGAS} (\textbf{V}alue-\textbf{G}uided \textbf{A}ction-chunk \textbf{S}election). It performs inference-time best-of-$N$ selection to identify action chunks that are both semantically faithful and geometrically precise. Specifically, \textbf{VGAS} employs a finetuned VLA as a high-recall proposal generator and introduces the \textrm{Q-Chunk-Former}, a geometrically grounded Transformer critic to resolve fine-grained geometric ambiguities. In addition, we propose \textit{Explicit Geometric Regularization} (\texttt{EGR}), which shapes a discriminative value landscape to preserve action ranking resolution among near-miss candidates while mitigating value instability under scarce supervision. Experiments and theoretical analysis demonstrate that \textbf{VGAS} consistently improves success rates and robustness under limited demonstrations and distribution shifts. Our code is available at https://github.com/Jyugo-15/VGAS.
LGOct 3, 2022
Learning from the Dark: Boosting Graph Convolutional Neural Networks with Diverse Negative SamplesWei Duan, Junyu Xuan, Maoying Qiao et al.
Graph Convolutional Neural Networks (GCNs) has been generally accepted to be an effective tool for node representations learning. An interesting way to understand GCNs is to think of them as a message passing mechanism where each node updates its representation by accepting information from its neighbours (also known as positive samples). However, beyond these neighbouring nodes, graphs have a large, dark, all-but forgotten world in which we find the non-neighbouring nodes (negative samples). In this paper, we show that this great dark world holds a substantial amount of information that might be useful for representation learning. Most specifically, it can provide negative information about the node representations. Our overall idea is to select appropriate negative samples for each node and incorporate the negative information contained in these samples into the representation updates. Moreover, we show that the process of selecting the negative samples is not trivial. Our theme therefore begins by describing the criteria for a good negative sample, followed by a determinantal point process algorithm for efficiently obtaining such samples. A GCN, boosted by diverse negative samples, then jointly considers the positive and negative information when passing messages. Experimental evaluations show that this idea not only improves the overall performance of standard representation learning but also significantly alleviates over-smoothing problems.
LGDec 5, 2022
Graph Convolutional Neural Networks with Diverse Negative Samples via Decomposed Determinant Point ProcessesWei Duan, Junyu Xuan, Maoying Qiao et al.
Graph convolutional networks (GCNs) have achieved great success in graph representation learning by extracting high-level features from nodes and their topology. Since GCNs generally follow a message-passing mechanism, each node aggregates information from its first-order neighbour to update its representation. As a result, the representations of nodes with edges between them should be positively correlated and thus can be considered positive samples. However, there are more non-neighbour nodes in the whole graph, which provide diverse and useful information for the representation update. Two non-adjacent nodes usually have different representations, which can be seen as negative samples. Besides the node representations, the structural information of the graph is also crucial for learning. In this paper, we used quality-diversity decomposition in determinant point processes (DPP) to obtain diverse negative samples. When defining a distribution on diverse subsets of all non-neighbouring nodes, we incorporate both graph structure information and node representations. Since the DPP sampling process requires matrix eigenvalue decomposition, we propose a new shortest-path-base method to improve computational efficiency. Finally, we incorporate the obtained negative samples into the graph convolution operation. The ideas are evaluated empirically in experiments on node classification tasks. These experiments show that the newly proposed methods not only improve the overall performance of standard representation learning but also significantly alleviate over-smoothing problems.
GNJul 21, 2024
LSTM Autoencoder-based Deep Neural Networks for Barley Genotype-to-Phenotype PredictionGuanjin Wang, Junyu Xuan, Penghao Wang et al.
Artificial Intelligence (AI) has emerged as a key driver of precision agriculture, facilitating enhanced crop productivity, optimized resource use, farm sustainability, and informed decision-making. Also, the expansion of genome sequencing technology has greatly increased crop genomic resources, deepening our understanding of genetic variation and enhancing desirable crop traits to optimize performance in various environments. There is increasing interest in using machine learning (ML) and deep learning (DL) algorithms for genotype-to-phenotype prediction due to their excellence in capturing complex interactions within large, high-dimensional datasets. In this work, we propose a new LSTM autoencoder-based model for barley genotype-to-phenotype prediction, specifically for flowering time and grain yield estimation, which could potentially help optimize yields and management practices. Our model outperformed the other baseline methods, demonstrating its potential in handling complex high-dimensional agricultural datasets and enhancing crop phenotype prediction performance.
LGSep 25, 2024
Functional Stochastic Gradient MCMC for Bayesian Neural NetworksMengjing Wu, Junyu Xuan, Jie Lu
Classical parameter-space Bayesian inference for Bayesian neural networks (BNNs) suffers from several unresolved prior issues, such as knowledge encoding intractability and pathological behaviours in deep networks, which can lead to improper posterior inference. To address these issues, functional Bayesian inference has recently been proposed leveraging functional priors, such as the emerging functional variational inference. In addition to variational methods, stochastic gradient Markov Chain Monte Carlo (MCMC) is another scalable and effective inference method for BNNs to asymptotically generate samples from the true posterior by simulating continuous dynamics. However, existing MCMC methods perform solely in parameter space and inherit the unresolved prior issues, while extending these dynamics to function space is a non-trivial undertaking. In this paper, we introduce novel functional MCMC schemes, including stochastic gradient versions, based on newly designed diffusion dynamics that can incorporate more informative functional priors. Moreover, we prove that the stationary measure of these functional dynamics is the target posterior over functions. Our functional MCMC schemes demonstrate improved performance in both predictive accuracy and uncertainty quantification on several tasks compared to naive parameter-space MCMC and functional variational inference.
AIMay 17
Heterogeneous Information-Bottleneck Coordination Graphs for Multi-Agent Reinforcement LearningWei Duan, Junyu Xuan, En Yu et al.
Coordination graphs are a central abstraction in cooperative multi-agent reinforcement learning (MARL), yet existing sparse-graph learners lack a theoretically grounded mechanism to decide which edges should exist and how much information each edge should carry. Current methods rely on heuristic criteria that offer no formal guarantee on the learned topology, and no principled way to allocate different communication capacities to structurally different agent relationships. To address this, we propose Heterogeneous Information-Bottleneck Coordination Graphs (HIBCG), which learns a group-aware sparse graph in which both edge existence and message capacity are theoretically justified. With the graph information bottleneck (GIB) serving as the underlying tool, HIBCG first constructs a group-aligned block-diagonal prior that provides a closed-form criterion for edge retention -- determining which edges should exist and at what density per group block -- and then controls per-agent feature bandwidth on the resulting topology, compressing messages to retain only task-relevant content. We prove that the group-aligned prior strictly tightens the variational bound on topology learning, that the objective decomposes per group block, enabling differential edge control, and that capacity allocation follows a water-filling principle.
LGApr 17, 2024
Group-Aware Coordination Graph for Multi-Agent Reinforcement LearningWei Duan, Jie Lu, Junyu Xuan
Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.
LGMar 18, 2024
Layer-diverse Negative Sampling for Graph Neural NetworksWei Duan, Jie Lu, Yu Guang Wang et al.
Graph neural networks (GNNs) are a powerful solution for various structure learning applications due to their strong representation capabilities for graph data. However, traditional GNNs, relying on message-passing mechanisms that gather information exclusively from first-order neighbours (known as positive samples), can lead to issues such as over-smoothing and over-squashing. To mitigate these issues, we propose a layer-diverse negative sampling method for message-passing propagation. This method employs a sampling matrix within a determinantal point process, which transforms the candidate set into a space and selectively samples from this space to generate negative samples. To further enhance the diversity of the negative samples during each forward pass, we develop a space-squeezing method to achieve layer-wise diversity in multi-layer GNNs. Experiments on various real-world graph datasets demonstrate the effectiveness of our approach in improving the diversity of negative samples and overall learning performance. Moreover, adding negative samples dynamically changes the graph's topology, thus with the strong potential to improve the expressiveness of GNNs and reduce the risk of over-squashing.
LGMar 28, 2024
Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement LearningWei Duan, Jie Lu, Junyu Xuan
Effective agent coordination is crucial in cooperative Multi-Agent Reinforcement Learning (MARL). While agent cooperation can be represented by graph structures, prevailing graph learning methods in MARL are limited. They rely solely on one-step observations, neglecting crucial historical experiences, leading to deficient graphs that foster redundant or detrimental information exchanges. Additionally, high computational demands for action-pair calculations in dense graphs impede scalability. To address these challenges, we propose inferring a Latent Temporal Sparse Coordination Graph (LTS-CG) for MARL. The LTS-CG leverages agents' historical observations to calculate an agent-pair probability matrix, where a sparse graph is sampled from and used for knowledge exchange between agents, thereby simultaneously capturing agent dependencies and relation uncertainty. The computational complexity of this procedure is only related to the number of agents. This graph learning process is further augmented by two innovative characteristics: Predict-Future, which enables agents to foresee upcoming observations, and Infer-Present, ensuring a thorough grasp of the environmental context from limited data. These features allow LTS-CG to construct temporal graphs from historical and real-time information, promoting knowledge exchange during policy learning and effective collaboration. Graph learning and agent training occur simultaneously in an end-to-end manner. Our demonstrated results on the StarCraft II benchmark underscore LTS-CG's superior performance.
LGMay 23, 2024
A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change PointsZihe Liu, Jie Lu, Guangquan Zhang et al.
Deep reinforcement learning is used in various domains, but usually under the assumption that the environment has stationary conditions like transitions and state distributions. When this assumption is not met, performance suffers. For this reason, tracking continuous environmental changes and adapting to unpredictable conditions is challenging yet crucial because it ensures that systems remain reliable and flexible in practical scenarios. Our research introduces Behavior-Aware Detection and Adaptation (BADA), an innovative framework that merges environmental change detection with behavior adaptation. The key inspiration behind our method is that policies exhibit different global behaviors in changing environments. Specifically, environmental changes are identified by analyzing variations between behaviors using Wasserstein distances without manually set thresholds. The model adapts to the new environment through behavior regularization based on the extent of changes. The results of a series of experiments demonstrate better performance relative to several current algorithms. This research also indicates significant potential for tackling this long-standing challenge.
LGDec 11, 2025
Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement LearningWei Duan, Jie Lu, En Yu et al.
Graph-based multi-agent reinforcement learning (MARL) enables coordinated behavior under partial observability by modeling agents as nodes and communication links as edges. While recent methods excel at learning sparse coordination graphs-determining who communicates with whom-they do not address what information should be transmitted under hard bandwidth constraints. We study this bandwidth-limited regime and show that naive dimensionality reduction consistently degrades coordination performance. Hard bandwidth constraints force selective encoding, but deterministic projections lack mechanisms to control how compression occurs. We introduce Bandwidth-constrained Variational Message Encoding (BVME), a lightweight module that treats messages as samples from learned Gaussian posteriors regularized via KL divergence to an uninformative prior. BVME's variational framework provides principled, tunable control over compression strength through interpretable hyperparameters, directly constraining the representations used for decision-making. Across SMACv1, SMACv2, and MPE benchmarks, BVME achieves comparable or superior performance while using 67--83% fewer message dimensions, with gains most pronounced on sparse graphs where message quality critically impacts coordination. Ablations reveal U-shaped sensitivity to bandwidth, with BVME excelling at extreme ratios while adding minimal overhead.
LGOct 10, 2025
HiBBO: HiPPO-based Space Consistency for High-dimensional Bayesian OptimisationJunyu Xuan, Wenlong Chen, Yingzhen Li
Bayesian Optimisation (BO) is a powerful tool for optimising expensive blackbox functions but its effectiveness diminishes in highdimensional spaces due to sparse data and poor surrogate model scalability While Variational Autoencoder (VAE) based approaches address this by learning low-dimensional latent representations the reconstructionbased objective function often brings the functional distribution mismatch between the latent space and original space leading to suboptimal optimisation performance In this paper we first analyse the reason why reconstructiononly loss may lead to distribution mismatch and then propose HiBBO a novel BO framework that introduces the space consistency into the latent space construction in VAE using HiPPO - a method for longterm sequence modelling - to reduce the functional distribution mismatch between the latent space and original space Experiments on highdimensional benchmark tasks demonstrate that HiBBO outperforms existing VAEBO methods in convergence speed and solution quality Our work bridges the gap between high-dimensional sequence representation learning and efficient Bayesian Optimisation enabling broader applications in neural architecture search materials science and beyond.
CLOct 7, 2025
InforME: Improving Informativeness of Abstractive Text Summarization With Informative Attention Guided by Named Entity SalienceJianbin Shen, Christy Jie Liang, Junyu Xuan
Abstractive text summarization is integral to the Big Data era, which demands advanced methods to turn voluminous and often long text data into concise but coherent and informative summaries for efficient human consumption. Despite significant progress, there is still room for improvement in various aspects. One such aspect is to improve informativeness. Hence, this paper proposes a novel learning approach consisting of two methods: an optimal transport-based informative attention method to improve learning focal information in reference summaries and an accumulative joint entropy reduction method on named entities to enhance informative salience. Experiment results show that our approach achieves better ROUGE scores compared to prior work on CNN/Daily Mail while having competitive results on XSum. Human evaluation of informativeness also demonstrates the better performance of our approach over a strong baseline. Further analysis gives insight into the plausible reasons underlying the evaluation results.
MASep 20, 2025
Bayesian Ego-graph inference for Networked Multi-Agent Reinforcement LearningWei Duan, Jie Lu, Junyu Xuan
In networked multi-agent reinforcement learning (Networked-MARL), decentralized agents must act under local observability and constrained communication over fixed physical graphs. Existing methods often assume static neighborhoods, limiting adaptability to dynamic or heterogeneous environments. While centralized frameworks can learn dynamic graphs, their reliance on global state access and centralized infrastructure is impractical in real-world decentralized systems. We propose a stochastic graph-based policy for Networked-MARL, where each agent conditions its decision on a sampled subgraph over its local physical neighborhood. Building on this formulation, we introduce BayesG, a decentralized actor-framework that learns sparse, context-aware interaction structures via Bayesian variational inference. Each agent operates over an ego-graph and samples a latent communication mask to guide message passing and policy computation. The variational distribution is trained end-to-end alongside the policy using an evidence lower bound (ELBO) objective, enabling agents to jointly learn both interaction topology and decision-making strategies. BayesG outperforms strong MARL baselines on large-scale traffic control tasks with up to 167 agents, demonstrating superior scalability, efficiency, and performance.
LGOct 31, 2024
A Non-Monolithic Policy Approach of Offline-to-Online Reinforcement LearningJaeYoon Kim, Junyu Xuan, Christy Liang et al.
Offline-to-online reinforcement learning (RL) leverages both pre-trained offline policies and online policies trained for downstream tasks, aiming to improve data efficiency and accelerate performance enhancement. An existing approach, Policy Expansion (PEX), utilizes a policy set composed of both policies without modifying the offline policy for exploration and learning. However, this approach fails to ensure sufficient learning of the online policy due to an excessive focus on exploration with both policies. Since the pre-trained offline policy can assist the online policy in exploiting a downstream task based on its prior experience, it should be executed effectively and tailored to the specific requirements of the downstream task. In contrast, the online policy, with its immature behavioral strategy, has the potential for exploration during the training phase. Therefore, our research focuses on harmonizing the advantages of the offline policy, termed exploitation, with those of the online policy, referred to as exploration, without modifying the offline policy. In this study, we propose an innovative offline-to-online RL method that employs a non-monolithic exploration approach. Our methodology demonstrates superior performance compared to PEX.
LGMay 4, 2024
Decoupling Exploration and Exploitation for Unsupervised Pre-training with Successor FeaturesJaeYoon Kim, Junyu Xuan, Christy Liang et al.
Unsupervised pre-training has been on the lookout for the virtue of a value function representation referred to as successor features (SFs), which decouples the dynamics of the environment from the rewards. It has a significant impact on the process of task-specific fine-tuning due to the decomposition. However, existing approaches struggle with local optima due to the unified intrinsic reward of exploration and exploitation without considering the linear regression problem and the discriminator supporting a small skill sapce. We propose a novel unsupervised pre-training model with SFs based on a non-monolithic exploration methodology. Our approach pursues the decomposition of exploitation and exploration of an agent built on SFs, which requires separate agents for the respective purpose. The idea will leverage not only the inherent characteristics of SFs such as a quick adaptation to new tasks but also the exploratory and task-agnostic capabilities. Our suggested model is termed Non-Monolithic unsupervised Pre-training with Successor features (NMPS), which improves the performance of the original monolithic exploration method of pre-training with SFs. NMPS outperforms Active Pre-training with Successor Features (APS) in a comparative experiment.
AIMay 2, 2023
An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options FrameworkJaeYoon Kim, Junyu Xuan, Christy Liang et al.
Most exploration research on reinforcement learning (RL) has paid attention to `the way of exploration', which is `how to explore'. The other exploration research, `when to explore', has not been the main focus of RL exploration research. The issue of `when' of a monolithic exploration in the usual RL exploration behaviour binds an exploratory action to an exploitational action of an agent. Recently, a non-monolithic exploration research has emerged to examine the mode-switching exploration behaviour of humans and animals. The ultimate purpose of our research is to enable an agent to decide when to explore or exploit autonomously. We describe the initial research of an autonomous multi-mode exploration of non-monolithic behaviour in an options framework. The higher performance of our method is shown against the existing non-monolithic exploration method through comparative experimental results.
LGSep 27, 2021
Bayesian Transfer Learning: An Overview of Probabilistic Graphical Models for Transfer LearningJunyu Xuan, Jie Lu, Guangquan Zhang
Transfer learning where the behavior of extracting transferable knowledge from the source domain(s) and reusing this knowledge to target domain has become a research area of great interest in the field of artificial intelligence. Probabilistic graphical models (PGMs) have been recognized as a powerful tool for modeling complex systems with many advantages, e.g., the ability to handle uncertainty and possessing good interpretability. Considering the success of these two aforementioned research areas, it seems natural to apply PGMs to transfer learning. However, although there are already some excellent PGMs specific to transfer learning in the literature, the potential of PGMs for this problem is still grossly underestimated. This paper aims to boost the development of PGMs for transfer learning by 1) examining the pilot studies on PGMs specific to transfer learning, i.e., analyzing and summarizing the existing mechanisms particularly designed for knowledge transfer; 2) discussing examples of real-world transfer problems where existing PGMs have been successfully applied; and 3) exploring several potential research directions on transfer learning using PGM.
MLSep 20, 2021
Deep Bayesian Estimation for Dynamic Treatment Regimes with a Long Follow-up TimeAdi Lin, Jie Lu, Junyu Xuan et al.
Causal effect estimation for dynamic treatment regimes (DTRs) contributes to sequential decision making. However, censoring and time-dependent confounding under DTRs are challenging as the amount of observational data declines over time due to a reducing sample size but the feature dimension increases over time. Long-term follow-up compounds these challenges. Another challenge is the highly complex relationships between confounders, treatments, and outcomes, which causes the traditional and commonly used linear methods to fail. We combine outcome regression models with treatment models for high dimensional features using uncensored subjects that are small in sample size and we fit deep Bayesian models for outcome regression models to reveal the complex relationships between confounders, treatments, and outcomes. Also, the developed deep Bayesian models can model uncertainty and output the prediction variance which is essential for the safety-aware applications, such as self-driving cars and medical treatment design. The experimental results on medical simulations of HIV treatment show the ability of the proposed method to obtain stable and accurate dynamic causal effect estimation from observational data, especially with long-term follow-up. Our technique provides practical guidance for sequential decision making, and policy-making.
LGJul 17, 2021
Hierarchical Reinforcement Learning with Optimal Level Synchronization based on a Deep Generative ModelJaeYoon Kim, Junyu Xuan, Christy Liang et al.
The high-dimensional or sparse reward task of a reinforcement learning (RL) environment requires a superior potential controller such as hierarchical reinforcement learning (HRL) rather than an atomic RL because it absorbs the complexity of commands to achieve the purpose of the task in its hierarchical structure. One of the HRL issues is how to train each level policy with the optimal data collection from its experience. That is to say, how to synchronize adjacent level policies optimally. Our research finds that a HRL model through the off-policy correction technique of HRL, which trains a higher-level policy with the goal of reflecting a lower-level policy which is newly trained using the off-policy method, takes the critical role of synchronizing both level policies at all times while they are being trained. We propose a novel HRL model supporting the optimal level synchronization using the off-policy correction technique with a deep generative model. This uses the advantage of the inverse operation of a flow-based deep generative model (FDGM) to achieve the goal corresponding to the current state of the lower-level policy. The proposed model also considers the freedom of the goal dimension between HRL policies which makes it the generalized inverse model of the model-free RL in HRL with the optimal synchronization method. The comparative experiment results show the performance of our proposed model.
LGJun 29, 2020
Path Integral Based Convolution and Pooling for Graph Neural NetworksZheng Ma, Junyu Xuan, Yu Guang Wang et al.
Graph neural networks (GNNs) extends the functionality of traditional neural networks to graph-structured data. Similar to CNNs, an optimized design of graph convolution and pooling is key to success. Borrowing ideas from physics, we propose a path integral based graph neural networks (PAN) for classification and regression tasks on graphs. Specifically, we consider a convolution operation that involves every path linking the message sender and receiver with learnable weights depending on the path length, which corresponds to the maximal entropy random walk. It generalizes the graph Laplacian to a new transition matrix we call maximal entropy transition (MET) matrix derived from a path integral formalism. Importantly, the diagonal entries of the MET matrix are directly related to the subgraph centrality, thus providing a natural and adaptive pooling mechanism. PAN provides a versatile framework that can be tailored for different graph data with varying sizes and structures. We can view most existing GNN architectures as special cases of PAN. Experimental results show that PAN achieves state-of-the-art performance on various graph classification/regression tasks, including a new benchmark dataset from statistical mechanics we propose to boost applications of GNN in physical sciences.
LGJul 19, 2019
Open Set Domain Adaptation: Theoretical Bound and AlgorithmZhen Fang, Jie Lu, Feng Liu et al.
The aim of unsupervised domain adaptation is to leverage the knowledge in a labeled (source) domain to improve a model's learning performance with an unlabeled (target) domain -- the basic strategy being to mitigate the effects of discrepancies between the two distributions. Most existing algorithms can only handle unsupervised closed set domain adaptation (UCSDA), i.e., where the source and target domains are assumed to share the same label set. In this paper, we target a more challenging but realistic setting: unsupervised open set domain adaptation (UOSDA), where the target domain has unknown classes that are not found in the source domain. This is the first study to provide a learning bound for open set domain adaptation, which we do by theoretically investigating the risk of the target classifier on unknown classes. The proposed learning bound has a special term, namely open set difference, which reflects the risk of the target classifier on unknown classes. Further, we present a novel and theoretically guided unsupervised algorithm for open set domain adaptation, called distribution alignment with ppen difference (DAOD), which is based on regularizing this open set difference bound. The experiments on several benchmark datasets show the superior performance of the proposed UOSDA method compared with the state-of-the-art methods in the literature.
LGJul 18, 2017
Cooperative Hierarchical Dirichlet Processes: Superposition vs. MaximizationJunyu Xuan, Jie Lu, Guangquan Zhang et al.
The cooperative hierarchical structure is a common and significant data structure observed in, or adopted by, many research areas, such as: text mining (author-paper-word) and multi-label classification (label-instance-feature). Renowned Bayesian approaches for cooperative hierarchical structure modeling are mostly based on topic models. However, these approaches suffer from a serious issue in that the number of hidden topics/factors needs to be fixed in advance and an inappropriate number may lead to overfitting or underfitting. One elegant way to resolve this issue is Bayesian nonparametric learning, but existing work in this area still cannot be applied to cooperative hierarchical structure modeling. In this paper, we propose a cooperative hierarchical Dirichlet process (CHDP) to fill this gap. Each node in a cooperative hierarchical structure is assigned a Dirichlet process to model its weights on the infinite hidden factors/topics. Together with measure inheritance from hierarchical Dirichlet process, two kinds of measure cooperation, i.e., superposition and maximization, are defined to capture the many-to-many relationships in the cooperative hierarchical structure. Furthermore, two constructive representations for CHDP, i.e., stick-breaking and international restaurant process, are designed to facilitate the model inference. Experiments on synthetic and real-world data with cooperative hierarchical structures demonstrate the properties and the ability of CHDP for cooperative hierarchical structure modeling and its potential for practical application scenarios.
MLJul 12, 2015
Dependent Indian Buffet Process-based Sparse Nonparametric Nonnegative Matrix FactorizationJunyu Xuan, Jie Lu, Guangquan Zhang et al.
Nonnegative Matrix Factorization (NMF) aims to factorize a matrix into two optimized nonnegative matrices appropriate for the intended applications. The method has been widely used for unsupervised learning tasks, including recommender systems (rating matrix of users by items) and document clustering (weighting matrix of papers by keywords). However, traditional NMF methods typically assume the number of latent factors (i.e., dimensionality of the loading matrices) to be fixed. This assumption makes them inflexible for many applications. In this paper, we propose a nonparametric NMF framework to mitigate this issue by using dependent Indian Buffet Processes (dIBP). In a nutshell, we apply a correlation function for the generation of two stick weights associated with each pair of columns of loading matrices, while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two loading matrices will be column-wise (indirectly) correlated. Under this same framework, two classes of correlation function are proposed (1) using Bivariate beta distribution and (2) using Copula function. Both methods allow us to adopt our work for various applications by flexibly choosing an appropriate parameter settings. Compared with the other state-of-the art approaches in this area, such as using Gaussian Process (GP)-based dIBP, our work is seen to be much more flexible in terms of allowing the two corresponding binary matrix columns to have greater variations in their non-zero entries. Our experiments on the real-world and synthetic datasets show that three proposed models perform well on the document clustering task comparing standard NMF without predefining the dimension for the factor matrices, and the Bivariate beta distribution-based and Copula-based models have better flexibility than the GP-based model.
SIApr 1, 2015
Nonnegative Multi-level Network Factorization for Latent Factor AnalysisJunyu Xuan, Jie Lu, Xiangfeng Luo et al.
Nonnegative Matrix Factorization (NMF) aims to factorize a matrix into two optimized nonnegative matrices and has been widely used for unsupervised learning tasks such as product recommendation based on a rating matrix. However, although networks between nodes with the same nature exist, standard NMF overlooks them, e.g., the social network between users. This problem leads to comparatively low recommendation accuracy because these networks are also reflections of the nature of the nodes, such as the preferences of users in a social network. Also, social networks, as complex networks, have many different structures. Each structure is a composition of links between nodes and reflects the nature of nodes, so retaining the different network structures will lead to differences in recommendation performance. To investigate the impact of these network structures on the factorization, this paper proposes four multi-level network factorization algorithms based on the standard NMF, which integrates the vertical network (e.g., rating matrix) with the structures of horizontal network (e.g., user social network). These algorithms are carefully designed with corresponding convergence proofs to retain four desired network structures. Experiments on synthetic data show that the proposed algorithms are able to preserve the desired network structures as designed. Experiments on real-world data show that considering the horizontal networks improves the accuracy of document clustering and recommendation with standard NMF, and various structures show their differences in performance on these two tasks. These results can be directly used in document clustering and recommendation systems.
MLMar 30, 2015
Nonparametric Relational Topic Models through Dependent Gamma ProcessesJunyu Xuan, Jie Lu, Guangquan Zhang et al.
Traditional Relational Topic Models provide a way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, link prediction, benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known in advance, and this is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model in this paper. Instead of using fixed-dimensional probability distributions in its generative model, we use stochastic processes. Specifically, a gamma process is assigned to each document, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different gamma process from the global gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.
MLMar 30, 2015
Infinite Author Topic Model based on Mixed Gamma-Negative Binomial ProcessJunyu Xuan, Jie Lu, Guangquan Zhang et al.
Incorporating the side information of text corpus, i.e., authors, time stamps, and emotional tags, into the traditional text mining models has gained significant interests in the area of information retrieval, statistical natural language processing, and machine learning. One branch of these works is the so-called Author Topic Model (ATM), which incorporates the authors's interests as side information into the classical topic model. However, the existing ATM needs to predefine the number of topics, which is difficult and inappropriate in many real-world settings. In this paper, we propose an Infinite Author Topic (IAT) model to resolve this issue. Instead of assigning a discrete probability on fixed number of topics, we use a stochastic process to determine the number of topics from the data itself. To be specific, we extend a gamma-negative binomial process to three levels in order to capture the author-document-keyword hierarchical structure. Furthermore, each document is assigned a mixed gamma process that accounts for the multi-author's contribution towards this document. An efficient Gibbs sampling inference algorithm with each conditional distribution being closed-form is developed for the IAT model. Experiments on several real-world datasets show the capabilities of our IAT model to learn the hidden topics, authors' interests on these topics and the number of topics simultaneously.