Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue SystemLingbo Mo, Shijie Chen, Ziru Chen et al. · microsoft-research
We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps. Covering a wide range of cooking and how-to tasks, we aim to deliver a collaborative and engaging dialogue experience. Equipped with language understanding, dialogue management, and response generation components supported by a robust search engine, TacoBot ensures efficient task assistance. To enhance the dialogue experience, we explore a series of data augmentation strategies using LLMs to train advanced neural models continuously. TacoBot builds upon our successful participation in the inaugural Alexa Prize TaskBot Challenge, where our team secured third place among ten competing teams. We offer TacoBot as an open-source framework that serves as a practical example for deploying task-oriented dialogue systems.
FederatedScope: A Flexible Federated Learning Platform for HeterogeneityYuexiang Xie, Zhen Wang, Dawei Gao et al.
Although remarkable progress has been made by existing federated learning (FL) platforms to provide infrastructures for development, these platforms may not well tackle the challenges brought by various types of heterogeneity, including the heterogeneity in participants' local data, resources, behaviors and learning goals. To fill this gap, in this paper, we propose a novel FL platform, named FederatedScope, which employs an event-driven architecture to provide users with great flexibility to independently describe the behaviors of different participants. Such a design makes it easy for users to describe participants with various local training processes, learning goals and backends, and coordinate them into an FL course with synchronous or asynchronous training strategies. Towards an easy-to-use and flexible platform, FederatedScope enables rich types of plug-in operations and components for efficient further development, and we have implemented several important components to better help users with privacy protection, attack simulation and auto-tuning. We have released FederatedScope at https://github.com/alibaba/FederatedScope to promote academic research and industrial deployment of federated learning in a wide range of scenarios.
FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph LearningZhen Wang, Weirui Kuang, Yuexiang Xie et al.
The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and natural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package FederatedScope-GNN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities. We validate the effectiveness of FS-G by conducting extensive experiments, which simultaneously gains many valuable insights about FGL for the community. Moreover, we employ FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits. We publicly release FS-G, as submodules of FederatedScope, at https://github.com/alibaba/FederatedScope to promote FGL's research and enable broad applications that would otherwise be infeasible due to the lack of a dedicated package.
Search to Pass Messages for Temporal Knowledge Graph CompletionZhen Wang, Haotong Du, Quanming Yao et al. · tsinghua
Completing missing facts is a fundamental task for temporal knowledge graphs (TKGs). Recently, graph neural network (GNN) based methods, which can simultaneously explore topological and temporal information, have become the state-of-the-art (SOTA) to complete TKGs. However, these studies are based on hand-designed architectures and fail to explore the diverse topological and temporal properties of TKG. To address this issue, we propose to use neural architecture search (NAS) to design data-specific message passing architecture for TKG completion. In particular, we develop a generalized framework to explore topological and temporal information in TKGs. Based on this framework, we design an expressive search space to fully capture various properties of different TKGs. Meanwhile, we adopt a search algorithm, which trains a supernet structure by sampling single path for efficient search with less cost. We further conduct extensive experiments on three benchmark datasets. The results show that the searched architectures by our method achieve the SOTA performances. Besides, the searched models can also implicitly reveal diverse properties in different TKGs. Our code is released in https://github.com/striderdu/SPA.
22.3CLOct 4, 2022
ThinkSum: Probabilistic reasoning over sets using large language modelsBatu Ozturkler, Nikolay Malkin, Zhen Wang et al. · stanford
Large language models (LLMs) have a substantial capacity for high-level analogical reasoning: reproducing patterns in linear text that occur in their training data (zero-shot evaluation) or in the provided context (few-shot in-context learning). However, recent studies show that even the more advanced LLMs fail in scenarios that require reasoning over multiple objects or facts and making sequences of logical deductions. We propose a two-stage probabilistic inference paradigm, ThinkSum, which reasons over sets of objects or facts in a structured manner. In the first stage (Think - retrieval of associations), a LLM is queried in parallel over a set of phrases extracted from the prompt or an auxiliary model call. In the second stage (Sum - probabilistic inference or reasoning), the results of these queries are aggregated to make the final prediction. We demonstrate the possibilities and advantages of ThinkSum on the BIG-bench suite of LLM evaluation tasks, achieving improvements over the state of the art using GPT-family models on thirteen difficult tasks, often with far smaller model variants. We also compare and contrast ThinkSum with other proposed modifications to direct prompting of LLMs, such as variants of chain-of-thought prompting. Our results suggest that because the probabilistic inference in ThinkSum is performed outside of calls to the LLM, ThinkSum is less sensitive to prompt design, yields more interpretable predictions, and can be flexibly combined with latent variable models to extract structured knowledge from LLMs. Overall, our proposed paradigm represents a promising approach for enhancing the reasoning capabilities of LLMs.
Hard Sample Aware Network for Contrastive Deep Graph ClusteringYue Liu, Xihong Yang, Sihang Zhou et al.
Contrastive deep graph clustering, which aims to divide nodes into disjoint groups via contrastive mechanisms, is a challenging research spot. Among the recent works, hard sample mining-based algorithms have achieved great attention for their promising performance. However, we find that the existing hard sample mining methods have two problems as follows. 1) In the hardness measurement, the important structural information is overlooked for similarity calculation, degrading the representativeness of the selected hard negative samples. 2) Previous works merely focus on the hard negative sample pairs while neglecting the hard positive sample pairs. Nevertheless, samples within the same cluster but with low similarity should also be carefully learned. To solve the problems, we propose a novel contrastive deep graph clustering method dubbed Hard Sample Aware Network (HSAN) by introducing a comprehensive similarity measure criterion and a general dynamic sample weighing strategy. Concretely, in our algorithm, the similarities between samples are calculated by considering both the attribute embeddings and the structure embeddings, better revealing sample relationships and assisting hardness measurement. Moreover, under the guidance of the carefully collected high-confidence clustering information, our proposed weight modulating function will first recognize the positive and negative samples and then dynamically up-weight the hard sample pairs while down-weighting the easy ones. In this way, our method can mine not only the hard negative samples but also the hard positive sample, thus improving the discriminative capability of the samples further. Extensive experiments and analyses demonstrate the superiority and effectiveness of our proposed method.
EvenNet: Ignoring Odd-Hop Neighbors Improves Robustness of Graph Neural NetworksRunlin Lei, Zhen Wang, Yaliang Li et al.
Graph Neural Networks (GNNs) have received extensive research attention for their promising performance in graph machine learning. Despite their extraordinary predictive accuracy, existing approaches, such as GCN and GPRGNN, are not robust in the face of homophily changes on test graphs, rendering these models vulnerable to graph structural attacks and with limited capacity in generalizing to graphs of varied homophily levels. Although many methods have been proposed to improve the robustness of GNN models, most of these techniques are restricted to the spatial domain and employ complicated defense mechanisms, such as learning new graph structures or calculating edge attentions. In this paper, we study the problem of designing simple and robust GNN models in the spectral domain. We propose EvenNet, a spectral GNN corresponding to an even-polynomial graph filter. Based on our theoretical analysis in both spatial and spectral domains, we demonstrate that EvenNet outperforms full-order models in generalizing across homophilic and heterophilic graphs, implying that ignoring odd-hop neighbors improves the robustness of GNNs. We conduct experiments on both synthetic and real-world datasets to demonstrate the effectiveness of EvenNet. Notably, EvenNet outperforms existing defense models against structural attacks without introducing additional computational costs and maintains competitiveness in traditional node classification tasks on homophilic and heterophilic graphs.
LON-GNN: Spectral GNNs with Learnable Orthonormal BasisQian Tao, Zhen Wang, Wenyuan Yu et al.
In recent years, a plethora of spectral graph neural networks (GNN) methods have utilized polynomial basis with learnable coefficients to achieve top-tier performances on many node-level tasks. Although various kinds of polynomial bases have been explored, each such method adopts a fixed polynomial basis which might not be the optimal choice for the given graph. Besides, we identify the so-called over-passing issue of these methods and show that it is somewhat rooted in their less-principled regularization strategy and unnormalized basis. In this paper, we make the first attempts to address these two issues. Leveraging Jacobi polynomials, we design a novel spectral GNN, LON-GNN, with Learnable OrthoNormal bases and prove that regularizing coefficients becomes equivalent to regularizing the norm of learned filter function now. We conduct extensive experiments on diverse graph datasets to evaluate the fitting and generalization capability of LON-GNN, where the results imply its superiority.
A Comprehensive Survey of Data Augmentation in Visual Reinforcement LearningGuozheng Ma, Zhen Wang, Zhecheng Yuan et al.
Visual reinforcement learning (RL), which makes decisions directly from high-dimensional visual inputs, has demonstrated significant potential in various domains. However, deploying visual RL techniques in the real world remains challenging due to their low sample efficiency and large generalization gaps. To tackle these obstacles, data augmentation (DA) has become a widely used technique in visual RL for acquiring sample-efficient and generalizable policies by diversifying the training data. This survey aims to provide a timely and essential review of DA techniques in visual RL in recognition of the thriving development in this field. In particular, we propose a unified framework for analyzing visual RL and understanding the role of DA in it. We then present a principled taxonomy of the existing augmentation techniques used in visual RL and conduct an in-depth discussion on how to better leverage augmented data in different scenarios. Moreover, we report a systematic empirical evaluation of DA-based techniques in visual RL and conclude by highlighting the directions for future research. As the first comprehensive survey of DA in visual RL, this work is expected to offer valuable guidance to this emerging field.
19.2LGJul 24, 2022
Online Continual Learning with Contrastive Vision TransformerZhen Wang, Liu Liu, Yajing Kong et al.
Online continual learning (online CL) studies the problem of learning sequential tasks from an online data stream without task boundaries, aiming to adapt to new data while alleviating catastrophic forgetting on the past tasks. This paper proposes a framework Contrastive Vision Transformer (CVT), which designs a focal contrastive learning strategy based on a transformer architecture, to achieve a better stability-plasticity trade-off for online CL. Specifically, we design a new external attention mechanism for online CL that implicitly captures previous tasks' information. Besides, CVT contains learnable focuses for each class, which could accumulate the knowledge of previous classes to alleviate forgetting. Based on the learnable focuses, we design a focal contrastive loss to rebalance contrastive learning between new and past classes and consolidate previously learned representations. Moreover, CVT contains a dual-classifier structure for decoupling learning current classes and balancing all observed classes. The extensive experimental results show that our approach achieves state-of-the-art performance with even fewer parameters on online CL benchmarks and effectively alleviates the catastrophic forgetting.
9.6LGMar 10, 2022
Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation FrameworkXiaotian Hao, Hangyu Mao, Weixun Wang et al.
The state space in Multiagent Reinforcement Learning (MARL) grows exponentially with the agent number. Such a curse of dimensionality results in poor scalability and low sample efficiency, inhibiting MARL for decades. To break this curse, we propose a unified agent permutation framework that exploits the permutation invariance (PI) and permutation equivariance (PE) inductive biases to reduce the multiagent state space. Our insight is that permuting the order of entities in the factored multiagent state space does not change the information. Specifically, we propose two novel implementations: a Dynamic Permutation Network (DPN) and a Hyper Policy Network (HPN). The core idea is to build separate entity-wise PI input and PE output network modules to connect the entity-factored state space and action space in an end-to-end way. DPN achieves such connections by two separate module selection networks, which consistently assign the same input module to the same input entity (guarantee PI) and assign the same output module to the same entity-related output (guarantee PE). To enhance the representation capability, HPN replaces the module selection networks of DPN with hypernetworks to directly generate the corresponding module weights. Extensive experiments in SMAC, Google Research Football and MPE validate that the proposed methods significantly boost the performance and the learning efficiency of existing MARL algorithms. Remarkably, in SMAC, we achieve 100% win rates in almost all hard and super-hard scenarios (never achieved before).
Towards Fine-grained Causal Reasoning and QALinyi Yang, Zhen Wang, Yuxiang Wu et al.
Understanding causality is key to the success of NLP applications, especially in high-stakes domains. Causality comes in various perspectives such as enable and prevent that, despite their importance, have been largely ignored in the literature. This paper introduces a novel fine-grained causal reasoning dataset and presents a series of novel predictive tasks in NLP, such as causality detection, event causality extraction, and Causal QA. Our dataset contains human annotations of 25K cause-effect event pairs and 24K question-answering pairs within multi-sentence samples, where each can have multiple causal relationships. Through extensive experiments and analysis, we show that the complex relations in our dataset bring unique challenges to state-of-the-art methods across all three tasks and highlight potential research opportunities, especially in developing "causal-thinking" methods.
3.8LGApr 11, 2023
HPN: Personalized Federated Hyperparameter OptimizationAnda Cheng, Zhen Wang, Yaliang Li et al.
Numerous research studies in the field of federated learning (FL) have attempted to use personalization to address the heterogeneity among clients, one of FL's most crucial and challenging problems. However, existing works predominantly focus on tailoring models. Yet, due to the heterogeneity of clients, they may each require different choices of hyperparameters, which have not been studied so far. We pinpoint two challenges of personalized federated hyperparameter optimization (pFedHPO): handling the exponentially increased search space and characterizing each client without compromising its data privacy. To overcome them, we propose learning a \textsc{H}yper\textsc{P}arameter \textsc{N}etwork (HPN) fed with client encoding to decide personalized hyperparameters. The client encoding is calculated with a random projection-based procedure to protect each client's privacy. Besides, we design a novel mechanism to debias the low-fidelity function evaluation samples for learning HPN. We conduct extensive experiments on FL tasks from various domains, demonstrating the superiority of HPN.
Cross Task Neural Architecture Search for EEG Signal ClassificationsYiqun Duan, Zhen Wang, Yi Li et al.
Electroencephalograms (EEGs) are brain dynamics measured outside the brain, which have been widely utilized in non-invasive brain-computer interface applications. Recently, various neural network approaches have been proposed to improve the accuracy of EEG signal recognition. However, these approaches severely rely on manually designed network structures for different tasks which generally are not sharing the same empirical design cross-task-wise. In this paper, we propose a cross-task neural architecture search (CTNAS-EEG) framework for EEG signal recognition, which can automatically design the network structure across tasks and improve the recognition accuracy of EEG signals. Specifically, a compatible search space for cross-task searching and an efficient constrained searching method is proposed to overcome challenges brought by EEG signals. By unifying structure search on different EEG tasks, this work is the first to explore and analyze the searched structure difference cross-task-wise. Moreover, by introducing architecture search, this work is the first to analyze model performance by customizing model structure for each human subject. Detailed experimental results suggest that the proposed CTNAS-EEG could reach state-of-the-art performance on different EEG tasks, such as Motor Imagery (MI) and Emotion recognition. Extensive experiments and detailed analysis are provided as a good reference for follow-up researchers.
6.9LGApr 6, 2022
PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment RepresentationsTong Sang, Hongyao Tang, Yi Ma et al.
Deep Reinforcement Learning (DRL) has been a promising solution to many complex decision-making problems. Nevertheless, the notorious weakness in generalization among environments prevent widespread application of DRL agents in real-world scenarios. Although advances have been made recently, most prior works assume sufficient online interaction on training environments, which can be costly in practical cases. To this end, we focus on an offline-training-online-adaptation setting, in which the agent first learns from offline experiences collected in environments with different dynamics and then performs online policy adaptation in environments with new dynamics. In this paper, we propose Policy Adaptation with Decoupled Representations (PAnDR) for fast policy adaptation. In offline training phase, the environment representation and policy representation are learned through contrastive learning and policy recovery, respectively. The representations are further refined by mutual information optimization to make them more decoupled and complete. With learned representations, a Policy-Dynamics Value Function (PDVF) [Raileanu et al., 2020] network is trained to approximate the values for different combinations of policies and environments from offline experiences. In online adaptation phase, with the environment context inferred from few experiences collected in new environments, the policy is optimized by gradient ascent with respect to the PDVF. Our experiments show that PAnDR outperforms existing algorithms in several representative policy adaptation problems.
1.7CLJun 27, 2023
MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuningZhehua Zhong, Tianyi Chen, Zhen Wang
Fine-tuning large-scale pre-trained language models has been demonstrated effective for various natural language processing (NLP) tasks. Previous studies have established that incorporating adversarial training during the fine-tuning stage can significantly enhance model generalization and robustness. However, from the perspective of game theory, such utilizations of adversarial training correspond to pure-strategy games, which are inherently limited in terms of the scope of their strategies, thereby still having room for improvement. In order to push the performance boundaries, we propose a novel Mixed-strategy Adversarial Training algorithm (MAT). Methodologically, we derive the Nash equilibrium of a mixed-strategy game for adversarial training using Entropy Mirror Descent to establish MAT by sampling method. To verify the effectiveness of MAT, we conducted extensive benchmark experiments on large-scale pre-trained models, such as BERT and RoBERTa. MAT significantly outperforms the state-of-the-art methods on both the GLUE and ANLI benchmarks in terms of generalization and robustness.
2.0CVAug 1, 2024
Cross-Scan Mamba with Masked Training for Robust Spectral ImagingWenzhe Tian, Haijin Zeng, Yin-Ping Zhao et al.
Snapshot Compressive Imaging (SCI) enables fast spectral imaging but requires effective decoding algorithms for hyperspectral image (HSI) reconstruction from compressed measurements. Current CNN-based methods are limited in modeling long-range dependencies, while Transformer-based models face high computational complexity. Although recent Mamba models outperform CNNs and Transformers in RGB tasks concerning computational efficiency or accuracy, they are not specifically optimized to fully leverage the local spatial and spectral correlations inherent in HSIs. To address this, we propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding and cross-channel interaction promotion. Besides, while current reconstruction algorithms perform increasingly well in simulation scenarios, they exhibit suboptimal performance on real data due to limited generalization capability. During the training process, the model may not capture the inherent features of the images but rather learn the parameters to mitigate specific noise and loss, which may lead to a decline in reconstruction quality when faced with real scenes. To overcome this challenge, we propose a masked training method to enhance the generalization ability of models. Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement LearningChenjia Bai, Lingxiao Wang, Jianye Hao et al.
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi-Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from other tasks exacerbates the distribution shift in offline RL. In this paper, we propose an uncertainty-based MTDS approach that shares the entire dataset without data selection. Given ensemble-based uncertainty quantification, we perform pessimistic value iteration on the shared offline dataset, which provides a unified framework for single- and multi-task offline RL. We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing. Empirically, we release an MTDS benchmark and collect datasets from three challenging domains. The experimental results show our algorithm outperforms the previous state-of-the-art methods in challenging MTDS problems. See https://github.com/Baichenjia/UTDS for the datasets and code.
Entity Tracking via Effective Use of Multi-Task Learning Model and Mention-guided DecodingJanvijay Singh, Fan Bai, Zhen Wang
Cross-task knowledge transfer via multi-task learning has recently made remarkable progress in general NLP tasks. However, entity tracking on the procedural text has not benefited from such knowledge transfer because of its distinct formulation, i.e., tracking the event flow while following structural constraints. State-of-the-art entity tracking approaches either design complicated model architectures or rely on task-specific pre-training to achieve good results. To this end, we propose MeeT, a Multi-task learning-enabled entity Tracking approach, which utilizes knowledge gained from general domain tasks to improve entity tracking. Specifically, MeeT first fine-tunes T5, a pre-trained multi-task learning model, with entity tracking-specialized QA formats, and then employs our customized decoding strategy to satisfy the structural constraints. MeeT achieves state-of-the-art performances on two popular entity tracking datasets, even though it does not require any task-specific architecture design or pre-training.
16.8CLFeb 10, 2023
ControversialQA: Exploring Controversy in Question AnsweringZhen Wang, Peide Zhu, Jie Yang
Controversy is widespread online. Previous studies mainly define controversy based on vague assumptions of its relation to sentiment such as hate speech and offensive words. This paper introduces the first question-answering dataset that defines content controversy by user perception, i.e., votes from plenty of users. It contains nearly 10K questions, and each question has a best answer and a most controversial answer. Experimental results reveal that controversy detection in question answering is essential and challenging, and there is no strong correlation between controversy and sentiment tasks.
Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement LearningXiaoyu Wen, Chenjia Bai, Kang Xu et al.
Cross-domain offline reinforcement learning leverages source domain data with diverse transition dynamics to alleviate the data requirement for the target domain. However, simply merging the data of two domains leads to performance degradation due to the dynamics mismatch. Existing methods address this problem by measuring the dynamics gap via domain classifiers while relying on the assumptions of the transferability of paired domains. In this paper, we propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains. We show that such an objective recovers the mutual-information gap of transition functions in two domains without suffering from the unbounded issue of the dynamics gap in handling significantly different domains. Based on the representations, we introduce a data filtering algorithm that selectively shares transitions from the source domain according to the contrastive score functions. Empirical results on various tasks demonstrate that our method achieves superior performance, using only 10% of the target data to achieve 89.2% of the performance on 100% target dataset with state-of-the-art methods.
8.5IRJun 23, 2025
PERSCEN: Learning Personalized Interaction Pattern and Scenario Preference for Multi-Scenario MatchingHaotong Du, Yaqing Wang, Fei Xiong et al.
With the expansion of business scales and scopes on online platforms, multi-scenario matching has become a mainstream solution to reduce maintenance costs and alleviate data sparsity. The key to effective multi-scenario recommendation lies in capturing both user preferences shared across all scenarios and scenario-aware preferences specific to each scenario. However, existing methods often overlook user-specific modeling, limiting the generation of personalized user representations. To address this, we propose PERSCEN, an innovative approach that incorporates user-specific modeling into multi-scenario matching. PERSCEN constructs a user-specific feature graph based on user characteristics and employs a lightweight graph neural network to capture higher-order interaction patterns, enabling personalized extraction of preferences shared across scenarios. Additionally, we leverage vector quantization techniques to distil scenario-aware preferences from users' behavior sequence within individual scenarios, facilitating user-specific and scenario-aware preference modeling. To enhance efficient and flexible information transfer, we introduce a progressive scenario-aware gated linear unit that allows fine-grained, low-latency fusion. Extensive experiments demonstrate that PERSCEN outperforms existing methods. Further efficiency analysis confirms that PERSCEN effectively balances performance with computational cost, ensuring its practicality for real-world industrial systems.
6.4LGMar 16, 2024
Graph Regularized NMF with L20-norm for Unsupervised Feature LearningZhen Wang, Wenwen Min
Nonnegative Matrix Factorization (NMF) is a widely applied technique in the fields of machine learning and data mining. Graph Regularized Non-negative Matrix Factorization (GNMF) is an extension of NMF that incorporates graph regularization constraints. GNMF has demonstrated exceptional performance in clustering and dimensionality reduction, effectively discovering inherent low-dimensional structures embedded within high-dimensional spaces. However, the sensitivity of GNMF to noise limits its stability and robustness in practical applications. In order to enhance feature sparsity and mitigate the impact of noise while mining row sparsity patterns in the data for effective feature selection, we introduce the $\ell_{2,0}$-norm constraint as the sparsity constraints for GNMF. We propose an unsupervised feature learning framework based on GNMF\_$\ell_{20}$ and devise an algorithm based on PALM and its accelerated version to address this problem. Additionally, we establish the convergence of the proposed algorithms and validate the efficacy and superiority of our approach through experiments conducted on both simulated and real image data.
3.3ITApr 30, 2024
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement LearningQiaosheng Zhang, Chenjia Bai, Shuyue Hu et al.
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as MAIDS, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the joint information ratio of the joint policy, and the min-player then minimizes the marginal information ratio with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of tilde{O}(sqrt{K}) for K episodes. To reduce the computational load of MAIDS, we develop an improved algorithm called Reg-MAIDS, which has the same Bayesian regret bound while enjoying less computational complexity. Moreover, by leveraging the flexibility of IDS principle in choosing the learning target, we propose two methods for constructing compressed environments based on rate-distortion theory, upon which we develop an algorithm Compressed-MAIDS wherein the learning target is a compressed environment. Finally, we extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
6.4LGFeb 7, 2024
Incorporating Retrieval-based Causal Learning with Information Bottlenecks for Interpretable Graph Neural NetworksJiahua Rao, Jiancong Xie, Hanjing Lin et al.
Graph Neural Networks (GNNs) have gained considerable traction for their capability to effectively process topological data, yet their interpretability remains a critical concern. Current interpretation methods are dominated by post-hoc explanations to provide a transparent and intuitive understanding of GNNs. However, they have limited performance in interpreting complicated subgraphs and can't utilize the explanation to advance GNN predictions. On the other hand, transparent GNN models are proposed to capture critical subgraphs. While such methods could improve GNN predictions, they usually don't perform well on explanations. Thus, it is desired for a new strategy to better couple GNN explanation and prediction. In this study, we have developed a novel interpretable causal GNN framework that incorporates retrieval-based causal learning with Graph Information Bottleneck (GIB) theory. The framework could semi-parametrically retrieve crucial subgraphs detected by GIB and compress the explanatory subgraphs via a causal module. The framework was demonstrated to consistently outperform state-of-the-art methods, and to achieve 32.71\% higher precision on real-world explanation scenarios with diverse explanation types. More importantly, the learned explanations were shown able to also improve GNN prediction performance.
2.3GTJul 23, 2025
Regret Minimization in Population Network Games: Vanishing Heterogeneity and Convergence to EquilibriaDie Hu, Shuyue Hu, Chunjiang Mu et al.
Understanding and predicting the behavior of large-scale multi-agents in games remains a fundamental challenge in multi-agent systems. This paper examines the role of heterogeneity in equilibrium formation by analyzing how smooth regret-matching drives a large number of heterogeneous agents with diverse initial policies toward unified behavior. By modeling the system state as a probability distribution of regrets and analyzing its evolution through the continuity equation, we uncover a key phenomenon in diverse multi-agent settings: the variance of the regret distribution diminishes over time, leading to the disappearance of heterogeneity and the emergence of consensus among agents. This universal result enables us to prove convergence to quantal response equilibria in both competitive and cooperative multi-agent settings. Our work advances the theoretical understanding of multi-agent learning and offers a novel perspective on equilibrium selection in diverse game-theoretic scenarios.
3.7CVApr 7, 2024
AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit DisentanglementShiwei Jin, Zhen Wang, Lei Wang et al.
Facial action unit (AU) intensity plays a pivotal role in quantifying fine-grained expression behaviors, which is an effective condition for facial expression manipulation. However, publicly available datasets containing intensity annotations for multiple AUs remain severely limited, often featuring a restricted number of subjects. This limitation places challenges to the AU intensity manipulation in images due to disentanglement issues, leading researchers to resort to other large datasets with pretrained AU intensity estimators for pseudo labels. In addressing this constraint and fully leveraging manual annotations of AU intensities for precise manipulation, we introduce AUEditNet. Our proposed model achieves impressive intensity manipulation across 12 AUs, trained effectively with only 18 subjects. Utilizing a dual-branch architecture, our approach achieves comprehensive disentanglement of facial attributes and identity without necessitating additional loss functions or implementing with large batch sizes. This approach offers a potential solution to achieve desired facial attribute editing despite the dataset's limited subject count. Our experiments demonstrate AUEditNet's superior accuracy in editing AU intensities, affirming its capability in disentangling facial attributes and identity within a limited subject pool. AUEditNet allows conditioning by either intensity values or target images, eliminating the need for constructing AU combinations for specific facial expression synthesis. Moreover, AU intensity estimation, as a downstream task, validates the consistency between real and edited images, confirming the effectiveness of our proposed AU intensity manipulation method.
4.6LGMay 12, 2024
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement LearningChanghong Wang, Xudong Yu, Chenjia Bai et al.
In Reinforcement Learning (RL), training a policy from scratch with online experiences can be inefficient because of the difficulties in exploration. Recently, offline RL provides a promising solution by giving an initialized offline policy, which can be refined through online interactions. However, existing approaches primarily perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation. In real-world applications, it is common that we only have an offline dataset from a specific task while aiming for fast online-adaptation for several tasks. To address this problem, our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning. We demonstrate that the conventional paradigm using successor features cannot effectively utilize offline data and improve the performance for the new task by online fine-tuning. To mitigate this, we introduce a novel methodology that leverages offline data to acquire an ensemble of successor representations and subsequently constructs ensemble Q functions. This approach enables robust representation learning from datasets with different coverage and facilitates fast adaption of Q functions towards new tasks during the online fine-tuning phase. Extensive empirical evaluations provide compelling evidence showcasing the superior performance of our method in generalizing to diverse or even unseen tasks.
5.8AINov 28, 2025
Evolutionary Discovery of Heuristic Policies for Traffic Signal ControlRuibing Wang, Shuhan Guo, Zeen Li et al.
Traffic Signal Control (TSC) involves a challenging trade-off: classic heuristics are efficient but oversimplified, while Deep Reinforcement Learning (DRL) achieves high performance yet suffers from poor generalization and opaque policies. Online Large Language Models (LLMs) provide general reasoning but incur high latency and lack environment-specific optimization. To address these issues, we propose Temporal Policy Evolution for Traffic (\textbf{\method{}}), which uses LLMs as an evolution engine to derive specialized heuristic policies. The framework introduces two key modules: (1) Structured State Abstraction (SSA), converting high-dimensional traffic data into temporal-logical facts for reasoning; and (2) Credit Assignment Feedback (CAF), tracing flawed micro-decisions to poor macro-outcomes for targeted critique. Operating entirely at the prompt level without training, \method{} yields lightweight, robust policies optimized for specific traffic environments, outperforming both heuristics and online LLM actors.
10.2CVDec 30, 2024
PQD: Post-training Quantization for Efficient Diffusion ModelsJiaojiao Ye, Zhen Wang, Linnan Jiang
Diffusionmodels(DMs)havedemonstratedremarkableachievements in synthesizing images of high fidelity and diversity. However, the extensive computational requirements and slow generative speed of diffusion models have limited their widespread adoption. In this paper, we propose a novel post-training quantization for diffusion models (PQD), which is a time-aware optimization framework for diffusion models based on post-training quantization. The proposed framework optimizes the inference process by selecting representative samples and conducting time-aware calibration. Experimental results show that our proposed method is able to directly quantize full-precision diffusion models into 8-bit or 4-bit models while maintaining comparable performance in a training-free manner, achieving a few FID change on ImageNet for unconditional image generation. Our approach demonstrates compatibility and can also be applied to 512x512 text-guided image generation for the first time.
7.6CVJan 25, 2024
Diffusion-based Data Augmentation for Object Counting ProblemsZhen Wang, Yuelei Li, Jia Wan et al.
Crowd counting is an important problem in computer vision due to its wide range of applications in image understanding. Currently, this problem is typically addressed using deep learning approaches, such as Convolutional Neural Networks (CNNs) and Transformers. However, deep networks are data-driven and are prone to overfitting, especially when the available labeled crowd dataset is limited. To overcome this limitation, we have designed a pipeline that utilizes a diffusion model to generate extensive training data. We are the first to generate images conditioned on a location dot map (a binary dot map that specifies the location of human heads) with a diffusion model. We are also the first to use these diverse synthetic data to augment the crowd counting models. Our proposed smoothed density map input for ControlNet significantly improves ControlNet's performance in generating crowds in the correct locations. Also, Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated. Additionally, our innovative guidance sampling further directs the diffusion process toward regions where the generated crowd images align most accurately with the location dot map. Collectively, we have enhanced ControlNet's ability to generate specified objects from a location dot map, which can be used for data augmentation in various counting problems. Moreover, our framework is versatile and can be easily adapted to all kinds of counting problems. Extensive experiments demonstrate that our framework improves the counting performance on the ShanghaiTech, NWPU-Crowd, UCF-QNRF, and TRANCOS datasets, showcasing its effectiveness.
11.5LGMay 25, 2023
Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement LearningGuozheng Ma, Linrui Zhang, Haoyu Wang et al.
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. Notably, employing simple observation transformations alone can yield outstanding performance without extra auxiliary representation tasks or pre-trained encoders. However, it remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL. To investigate this issue and further explore the potential of DA, this work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy and provides the following insights and improvements: (1) For individual DA operations, we reveal that both ample spatial diversity and slight hardness are indispensable. Building on this finding, we introduce Random PadResize (Rand PR), a new DA operation that offers abundant spatial diversity with minimal hardness. (2) For multi-type DA fusion schemes, the increased DA hardness and unstable data distribution result in the current fusion schemes being unable to achieve higher sample efficiency than their corresponding individual operations. Taking the non-stationary nature of RL into account, we propose a RL-tailored multi-type DA fusion scheme called Cycling Augmentation (CycAug), which performs periodic cycles of different DA operations to increase type diversity while maintaining data distribution consistency. Extensive evaluations on the DeepMind Control suite and CARLA driving simulator demonstrate that our methods achieve superior sample efficiency compared with the prior state-of-the-art methods.
41.5CLMay 24, 2023
Reasoning with Language Model is Planning with World ModelShibo Hao, Yi Gu, Haodi Ma et al.
Large language models (LLMs) have shown remarkable reasoning capabilities, especially when prompted to generate intermediate reasoning steps (e.g., Chain-of-Thought, CoT). However, LLMs can still struggle with problems that are easy for humans, such as generating action plans for executing tasks in a given environment, or performing complex math, logical, and commonsense reasoning. The deficiency stems from the key fact that LLMs lack an internal $\textit{world model}$ to predict the world $\textit{state}$ (e.g., environment status, intermediate variable values) and simulate long-term outcomes of actions. This prevents LLMs from performing deliberate planning akin to human brains, which involves exploring alternative reasoning paths, anticipating future states and rewards, and iteratively refining existing reasoning steps. To overcome the limitations, we propose a new LLM reasoning framework, $\underline{R}$easoning vi$\underline{a}$ $\underline{P}$lanning $\textbf{(RAP)}$. RAP repurposes the LLM as both a world model and a reasoning agent, and incorporates a principled planning algorithm (based on Monto Carlo Tree Search) for strategic exploration in the vast reasoning space. During reasoning, the LLM (as agent) incrementally builds a reasoning tree under the guidance of the LLM (as world model) and task-specific rewards, and obtains a high-reward reasoning path efficiently with a proper balance between exploration $\textit{vs.}$ exploitation. We apply RAP to a variety of challenging reasoning problems including plan generation, math reasoning, and logical inference. Empirical results on these tasks demonstrate the superiority of RAP over various strong baselines, including CoT and least-to-most prompting with self-consistency. RAP on LLAMA-33B surpasses CoT on GPT-4 with 33% relative improvement in a plan generation setting.
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool EmbeddingsShibo Hao, Tianyang Liu, Zhen Wang et al.
Augmenting large language models (LLMs) with external tools has emerged as a promising approach to solving complex problems. However, traditional methods, which finetune LLMs with tool demonstration data, can be both costly and restricted to a predefined set of tools. Recent in-context learning paradigm alleviates these issues, but the limited context length only allows for a few shots of demonstrations, leading to suboptimal understandings of the tools. Moreover, when there are numerous tools to choose from, in-context learning could completely fail to work. In this paper, we propose an alternative approach, $\textbf{ToolkenGPT}$, which combines the benefits of both sides. Our approach represents each $\underline{tool}$ as a to$\underline{ken}$ ($\textit{toolken}$) and learns an embedding for it, enabling tool calls in the same way as generating a regular word token. Once a toolken is triggered, the LLM is prompted to complete arguments for the tool to execute. ToolkenGPT offers the flexibility to plug in an arbitrary number of tools by expanding the set of toolkens on the fly. In addition, it improves tool use by allowing extensive demonstration data for learning the toolken embeddings. In diverse domains, including numerical reasoning, knowledge-based question answering, and embodied plan generation, our approach effectively augments LLMs with tools and substantially outperforms various latest baselines. ToolkenGPT demonstrates the promising ability to use relevant tools from a large tool set in complex scenarios.
7.6CVMay 19, 2023
ReDirTrans: Latent-to-Latent Translation for Gaze and Head RedirectionShiwei Jin, Zhen Wang, Lei Wang et al.
Learning-based gaze estimation methods require large amounts of training data with accurate gaze annotations. Facing such demanding requirements of gaze data collection and annotation, several image synthesis methods were proposed, which successfully redirected gaze directions precisely given the assigned conditions. However, these methods focused on changing gaze directions of the images that only include eyes or restricted ranges of faces with low resolution (less than $128\times128$) to largely reduce interference from other attributes such as hairs, which limits application scenarios. To cope with this limitation, we proposed a portable network, called ReDirTrans, achieving latent-to-latent translation for redirecting gaze directions and head orientations in an interpretable manner. ReDirTrans projects input latent vectors into aimed-attribute embeddings only and redirects these embeddings with assigned pitch and yaw values. Then both the initial and edited embeddings are projected back (deprojected) to the initial latent space as residuals to modify the input latent vectors by subtraction and addition, representing old status removal and new status addition. The projection of aimed attributes only and subtraction-addition operations for status replacement essentially mitigate impacts on other attributes and the distribution of latent vectors. Thus, by combining ReDirTrans with a pretrained fixed e4e-StyleGAN pair, we created ReDirTrans-GAN, which enables accurately redirecting gaze in full-face images with $1024\times1024$ resolution while preserving other attributes such as identity, expression, and hairstyle. Furthermore, we presented improvements for the downstream learning-based gaze estimation task, using redirected samples as dataset augmentation.
1.8LGFeb 26, 2022
Regularized Bilinear Discriminant Analysis for Multivariate Time Series DataJianhua Zhao, Haiye Liang, Shulan Li et al.
In recent years, the methods on matrix-based or bilinear discriminant analysis (BLDA) have received much attention. Despite their advantages, it has been reported that the traditional vector-based regularized LDA (RLDA) is still quite competitive and could outperform BLDA on some benchmark datasets. Nevertheless, it is also noted that this finding is mainly limited to image data. In this paper, we propose regularized BLDA (RBLDA) and further explore the comparison between RLDA and RBLDA on another type of matrix data, namely multivariate time series (MTS). Unlike image data, MTS typically consists of multiple variables measured at different time points. Although many methods for MTS data classification exist within the literature, there is relatively little work in exploring the matrix data structure of MTS data. Moreover, the existing BLDA can not be performed when one of its within-class matrices is singular. To address the two problems, we propose RBLDA for MTS data classification, where each of the two within-class matrices is regularized via one parameter. We develop an efficient implementation of RBLDA and an efficient model selection algorithm with which the cross validation procedure for RBLDA can be performed efficiently. Experiments on a number of real MTS data sets are conducted to evaluate the proposed algorithm and compare RBLDA with several closely related methods, including RLDA and BLDA. The results reveal that RBLDA achieves the best overall recognition performance and the proposed model selection algorithm is efficient; Moreover, RBLDA can produce better visualization of MTS data than RLDA.
Coherence boosting: When your pretrained language model is not paying enough attentionNikolay Malkin, Zhen Wang, Nebojsa Jojic
Long-range semantic coherence remains a challenge in automatic language generation and understanding. We demonstrate that large language models have insufficiently learned the effect of distant words on next-token prediction. We present coherence boosting, an inference procedure that increases a LM's focus on a long context. We show the benefits of coherence boosting with pretrained models by distributional analyses of generated ordinary text and dialog responses. It is also found that coherence boosting with state-of-the-art models for various zero-shot NLP tasks yields performance gains with no additional training.
17.5LGSep 12, 2021
HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action RepresentationBoyan Li, Hongyao Tang, Yan Zheng et al.
Discrete-continuous hybrid action space is a natural setting in many practical problems, such as robot control and game AI. However, most previous Reinforcement Learning (RL) works only demonstrate the success in controlling with either discrete or continuous action space, while seldom take into account the hybrid action space. One naive way to address hybrid action RL is to convert the hybrid action space into a unified homogeneous action space by discretization or continualization, so that conventional RL algorithms can be applied. However, this ignores the underlying structure of hybrid action space and also induces the scalability issue and additional approximation difficulties, thus leading to degenerated results. In this paper, we propose Hybrid Action Representation (HyAR) to learn a compact and decodable latent representation space for the original hybrid action space. HyAR constructs the latent space and embeds the dependence between discrete action and continuous parameter via an embedding table and conditional Variantional Auto-Encoder (VAE). To further improve the effectiveness, the action representation is trained to be semantically smooth through unsupervised environmental dynamics prediction. Finally, the agent then learns its policy with conventional DRL algorithms in the learned representation space and interacts with the environment by decoding the hybrid action embeddings to the original action space. We evaluate HyAR in a variety of environments with discrete-continuous action space. The results demonstrate the superiority of HyAR when compared with previous baselines, especially for high-dimensional action spaces.
N24News: A New Dataset for Multimodal News ClassificationZhen Wang, Xu Shan, Xiangxie Zhang et al.
Current news datasets merely focus on text features on the news and rarely leverage the feature of images, excluding numerous essential features for news classification. In this paper, we propose a new dataset, N24News, which is generated from New York Times with 24 categories and contains both text and image information in each news. We use a multitask multimodal method and the experimental results show multimodal news classification performs better than text-only news classification. Depending on the length of the text, the classification accuracy can be increased by up to 8.11%. Our research reveals the relationship between the performance of a multimodal classifier and its sub-classifiers, and also the possible improvements when applying multimodal in news classification. N24News is shown to have great potential to prompt the multimodal news studies.
8.5LGJul 29, 2020
FIVES: Feature Interaction Via Edge Search for Large-Scale Tabular DataYuexiang Xie, Zhen Wang, Yaliang Li et al.
High-order interactive features capture the correlation between different columns and thus are promising to enhance various learning tasks on ubiquitous tabular data. To automate the generation of interactive features, existing works either explicitly traverse the feature space or implicitly express the interactions via intermediate activations of some designed models. These two kinds of methods show that there is essentially a trade-off between feature interpretability and search efficiency. To possess both of their merits, we propose a novel method named Feature Interaction Via Edge Search (FIVES), which formulates the task of interactive feature generation as searching for edges on the defined feature graph. Specifically, we first present our theoretical evidence that motivates us to search for useful interactive features with increasing order. Then we instantiate this search strategy by optimizing both a dedicated graph neural network (GNN) and the adjacency tensor associated with the defined feature graph. In this way, the proposed FIVES method simplifies the time-consuming traversal as a typical training course of GNN and enables explicit feature generation according to the learned adjacency tensor. Experimental results on both benchmark and real-world datasets show the advantages of FIVES over several state-of-the-art methods. Moreover, the interactive features identified by FIVES are deployed on the recommender system of Taobao, a worldwide leading e-commerce platform. Results of an online A/B testing further verify the effectiveness of the proposed method FIVES, and we further provide FIVES as AI utilities for the customers of Alibaba Cloud.
8.7CLJan 13, 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture SearchDaoyuan Chen, Yaliang Li, Minghui Qiu et al.
Large pre-trained language models such as BERT have shown their effectiveness in various natural language processing tasks. However, the huge parameter size makes them difficult to be deployed in real-time applications that require quick inference with limited resources. Existing methods compress BERT into small models while such compression is task-independent, i.e., the same compressed BERT for all different downstream tasks. Motivated by the necessity and benefits of task-oriented BERT compression, we propose a novel compression method, AdaBERT, that leverages differentiable Neural Architecture Search to automatically compress BERT into task-adaptive small models for specific tasks. We incorporate a task-oriented knowledge distillation loss to provide search hints and an efficiency-aware loss as search constraints, which enables a good trade-off between efficiency and effectiveness for task-adaptive BERT compression. We evaluate AdaBERT on several NLP tasks, and the results demonstrate that those task-adaptive compressed models are 12.7x to 29.3x faster than BERT in inference time and 11.5x to 17.0x smaller in terms of parameter size, while comparable performance is maintained.
0.5CLOct 18, 2019
Learning to Answer Subjective, Specific Product-Related Queries using Customer Reviews by Adversarial Domain AdaptationManirupa Das, Zhen Wang, Evan Jaffe et al.
Online customer reviews on large-scale e-commerce websites, represent a rich and varied source of opinion data, often providing subjective qualitative assessments of product usage that can help potential customers to discover features that meet their personal needs and preferences. Thus they have the potential to automatically answer specific queries about products, and to address the problems of answer starvation and answer augmentation on associated consumer Q & A forums, by providing good answer alternatives. In this work, we explore several recently successful neural approaches to modeling sentence pairs, that could better learn the relationship between questions and ground truth answers, and thus help infer reviews that can best answer a question or augment a given answer. In particular, we hypothesize that our adversarial domain adaptation-based approach, due to its ability to additionally learn domain-invariant features from a large number of unlabeled, unpaired question-review samples, would perform better than our proposed baselines, at answering specific, subjective product-related queries using reviews. We validate this hypothesis using a small gold standard dataset of question-review pairs evaluated by human experts, significantly surpassing our chosen baselines. Moreover, our approach, using no labeled question-review sentence pair data for training, gives performance at par with another method utilizing labeled question-review samples for the same task.
A Distributed Fair Machine Learning Framework with Private Demographic Data ProtectionHui Hu, Yijun Liu, Zhen Wang et al.
Fair machine learning has become a significant research topic with broad societal impact. However, most fair learning methods require direct access to personal demographic data, which is increasingly restricted to use for protecting user privacy (e.g. by the EU General Data Protection Regulation). In this paper, we propose a distributed fair learning framework for protecting the privacy of demographic data. We assume this data is privately held by a third party, which can communicate with the data center (responsible for model development) without revealing the demographic information. We propose a principled approach to design fair learning methods under this framework, exemplify four methods and show they consistently outperform their existing counterparts in both fairness and accuracy across three real-world data sets. We theoretically analyze the framework, and prove it can learn models with high fairness or high accuracy, with their trade-offs balanced by a threshold variable.
Fisher-Bures Adversary Graph Convolutional NetworksKe Sun, Piotr Koniusz, Zhen Wang
In a graph convolutional network, we assume that the graph $G$ is generated wrt some observation noise. During learning, we make small random perturbations $ΔG$ of the graph and try to improve generalization. Based on quantum information geometry, $ΔG$ can be characterized by the eigendecomposition of the graph Laplacian matrix. We try to minimize the loss wrt the perturbed $G+Δ{G}$ while making $Δ{G}$ to be effective in terms of the Fisher information of the neural network. Our proposed model can consistently improve graph convolutional networks on semi-supervised node classification tasks with reasonable computational overhead. We present three different geometries on the manifold of graphs: the intrinsic geometry measures the information theoretic dynamics of a graph; the extrinsic geometry characterizes how such dynamics can affect externally a graph neural network; the embedding geometry is for measuring node embeddings. These new analytical tools are useful in developing a good understanding of graph neural networks and fostering new techniques.
1.5LGNov 6, 2018
Robust Bhattacharyya bound linear discriminant analysis through adaptive algorithmChun-Na Li, Yuan-Hai Shao, Zhen Wang et al.
In this paper, we propose a novel linear discriminant analysis criterion via the Bhattacharyya error bound estimation based on a novel L1-norm (L1BLDA) and L2-norm (L2BLDA). Both L1BLDA and L2BLDA maximize the between-class scatters which are measured by the weighted pairwise distances of class means and meanwhile minimize the within-class scatters under the L1-norm and L2-norm, respectively. The proposed models can avoid the small sample size (SSS) problem and have no rank limit that may encounter in LDA. It is worth mentioning that, the employment of L1-norm gives a robust performance of L1BLDA, and L1BLDA is solved through an effective non-greedy alternating direction method of multipliers (ADMM), where all the projection vectors can be obtained once for all. In addition, the weighting constants of L1BLDA and L2BLDA between the between-class and within-class terms are determined by the involved data set, which makes our L1BLDA and L2BLDA adaptive. The experimental results on both benchmark data sets as well as the handwritten digit databases demonstrate the effectiveness of the proposed methods.
32.1CLMay 16, 2018
Joint Training of Candidate Extraction and Answer Selection for Reading ComprehensionZhen Wang, Jiachen Liu, Xinyan Xiao et al.
While sophisticated neural-based techniques have been developed in reading comprehension, most approaches model the answer in an independent manner, ignoring its relations with other answer candidates. This problem can be even worse in open-domain scenarios, where candidates from multiple passages should be combined to answer a single question. In this paper, we formulate reading comprehension as an extract-then-select two-stage procedure. We first extract answer candidates from passages, then select the final answer by combining information from all the candidates. Furthermore, we regard candidate extraction as a latent variable and train the two-stage process jointly with reinforcement learning. As a result, our approach has improved the state-of-the-art performance significantly on two challenging open-domain reading comprehension datasets. Further analysis demonstrates the effectiveness of our model components, especially the information fusion of all the candidates and the joint training of the extract-then-select procedure.
On the Effectiveness of Least Squares Generative Adversarial NetworksXudong Mao, Qing Li, Haoran Xie et al.
Unsupervised learning with generative adversarial networks (GANs) has proven to be hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss for both the discriminator and the generator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson $χ^2$ divergence. We also show that the derived objective function that yields minimizing the Pearson $χ^2$ divergence performs better than the classical one of using least squares for classification. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stably during the learning process. For evaluating the image quality, we conduct both qualitative and quantitative experiments, and the experimental results show that LSGANs can generate higher quality images than regular GANs. Furthermore, we evaluate the stability of LSGANs in two groups. One is to compare between LSGANs and regular GANs without gradient penalty. We conduct three experiments, including Gaussian mixture distribution, difficult architectures, and a newly proposed method --- datasets with small variability, to illustrate the stability of LSGANs. The other one is to compare between LSGANs with gradient penalty (LSGANs-GP) and WGANs with gradient penalty (WGANs-GP). The experimental results show that LSGANs-GP succeed in training for all the difficult architectures used in WGANs-GP, including 101-layer ResNet.
Least Squares Generative Adversarial NetworksXudong Mao, Qing Li, Haoran Xie et al.
Unsupervised learning with generative adversarial networks (GANs) has proven hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson $χ^2$ divergence. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stable during the learning process. We evaluate LSGANs on five scene datasets and the experimental results show that the images generated by LSGANs are of better quality than the ones generated by regular GANs. We also conduct two comparison experiments between LSGANs and regular GANs to illustrate the stability of LSGANs.
1.2SISep 10, 2015
The robustness of multiplex networks under layer node-based attackDawei Zhao, Lianhai Wang, Zhen Wang
From transportation networks to complex infrastructures, and to social and economic networks, a large variety of systems can be described in terms of multiplex networks formed by a set of nodes interacting through different network layers. Network robustness, as one of the most successful application areas of complex networks, has also attracted great interest in both theoretical and empirical researches. However, the vast majority of existing researches mainly focus on the robustness of single-layer networks an interdependent networks, how multiplex networks respond to potential attack is still short of further exploration. Here we study the robustness of multiplex networks under two attack strategies: layer node-based random attack and layer node-based targeted attack. A theoretical analysis framework is proposed to calculate the critical threshold and the size of giant component of multiplex networks when a fraction of layer nodes are removed randomly or intentionally. Via numerous simulations, it is unveiled that the theoretical method can accurately predict the threshold and the size of giant component, irrespective of attack strategies. Moreover, we also compare the robustness of multiplex networks under multiplex node-based attack and layer node-based attack, and find that layer node-based attack makes multiplex networks more vulnerable, regardless of average degree and underlying topology. Our finding may shed new light on the protection of multiplex networks.