LGSep 12, 2022
Online Continual Learning via the Meta-learning Update with Multi-scale Knowledge Distillation and Data AugmentationYa-nan Han, Jian-wei Liu
Continual learning aims to rapidly and continually learn the current task from a sequence of tasks. Compared to other kinds of methods, the methods based on experience replay have shown great advantages to overcome catastrophic forgetting. One common limitation of this method is the data imbalance between the previous and current tasks, which would further aggravate forgetting. Moreover, how to effectively address the stability-plasticity dilemma in this setting is also an urgent problem to be solved. In this paper, we overcome these challenges by proposing a novel framework called Meta-learning update via Multi-scale Knowledge Distillation and Data Augmentation (MMKDDA). Specifically, we apply multiscale knowledge distillation to grasp the evolution of long-range and short-range spatial relationships at different feature levels to alleviate the problem of data imbalance. Besides, our method mixes the samples from the episodic memory and current task in the online continual training procedure, thus alleviating the side influence due to the change of probability distribution. Moreover, we optimize our model via the meta-learning update resorting to the number of tasks seen previously, which is helpful to keep a better balance between stability and plasticity. Finally, our experimental evaluation on four benchmark datasets shows the effectiveness of the proposed MMKDDA framework against other popular baselines, and ablation studies are also conducted to further analyze the role of each component in our framework.
LGJul 5, 2024Code
Proximal Point Method for Online Saddle Point ProblemQing-xin Meng, Jian-wei Liu
This paper focuses on the online saddle point problem, which involves a sequence of two-player time-varying convex-concave games. Considering the nonstationarity of the environment, we adopt the duality gap and the dynamic Nash equilibrium regret as performance metrics for algorithm design. We present three variants of the proximal point method: the Online Proximal Point Method (OPPM), the Optimistic OPPM (OptOPPM), and the OptOPPM with multiple predictors. Each algorithm guarantees upper bounds for both the duality gap and dynamic Nash equilibrium regret, achieving near-optimality when measured against the duality gap. Specifically, in certain benign environments, such as sequences of stationary payoff functions, these algorithms maintain a nearly constant metric bound. Experimental results further validate the effectiveness of these algorithms. Lastly, this paper discusses potential reliability concerns associated with using dynamic Nash equilibrium regret as a performance metric. The technical appendix and code can be found at https://github.com/qingxin6174/PPM-for-OSP.
CVFeb 2, 2023
Online Continual Learning via the Knowledge Invariant and Spread-out PropertiesYa-nan Han, Jian-wei Liu
The goal of continual learning is to provide intelligent agents that are capable of learning continually a sequence of tasks using the knowledge obtained from previous tasks while performing well on prior tasks. However, a key challenge in this continual learning paradigm is catastrophic forgetting, namely adapting a model to new tasks often leads to severe performance degradation on prior tasks. Current memory-based approaches show their success in alleviating the catastrophic forgetting problem by replaying examples from past tasks when new tasks are learned. However, these methods are infeasible to transfer the structural knowledge from previous tasks i.e., similarities or dissimilarities between different instances. Furthermore, the learning bias between the current and prior tasks is also an urgent problem that should be solved. In this work, we propose a new method, named Online Continual Learning via the Knowledge Invariant and Spread-out Properties (OCLKISP), in which we constrain the evolution of the embedding features via Knowledge Invariant and Spread-out Properties (KISP). Thus, we can further transfer the inter-instance structural knowledge of previous tasks while alleviating the forgetting due to the learning bias. We empirically evaluate our proposed method on four popular benchmarks for continual learning: Split CIFAR 100, Split SVHN, Split CUB200 and Split Tiny-Image-Net. The experimental results show the efficacy of our proposed method compared to the state-of-the-art continual learning algorithms.
LGSep 9, 2022
Self-supervised Learning for Heterogeneous Graph via Structure Information based on MetapathShuai Ma, Jian-wei Liu, Xin Zuo
graph neural networks (GNNs) are the dominant paradigm for modeling and handling graph structure data by learning universal node representation. The traditional way of training GNNs depends on a great many labeled data, which results in high requirements on cost and time. In some special scene, it is even unavailable and impracticable. Self-supervised representation learning, which can generate labels by graph structure data itself, is a potential approach to tackle this problem. And turning to research on self-supervised learning problem for heterogeneous graphs is more challenging than dealing with homogeneous graphs, also there are fewer studies about it. In this paper, we propose a SElfsupervised learning method for heterogeneous graph via Structure Information based on Metapath (SESIM). The proposed model can construct pretext tasks by predicting jump number between nodes in each metapath to improve the representation ability of primary task. In order to predict jump number, SESIM uses data itself to generate labels, avoiding time-consuming manual labeling. Moreover, predicting jump number in each metapath can effectively utilize graph structure information, which is the essential property between nodes. Therefore, SESIM deepens the understanding of models for graph structure. At last, we train primary task and pretext tasks jointly, and use meta-learning to balance the contribution of pretext tasks for primary task. Empirical results validate the performance of SESIM method and demonstrate that this method can improve the representation ability of traditional neural networks on link prediction task and node classification task.
CVSep 12, 2022
$β$-CapsNet: Learning Disentangled Representation for CapsNet by Information BottleneckMing-fei Hu, Jian-wei Liu
We present a framework for learning disentangled representation of CapsNet by information bottleneck constraint that distills information into a compact form and motivates to learn an interpretable factorized capsule. In our $β$-CapsNet framework, hyperparameter $β$ is utilized to trade-off disentanglement and other tasks, variational inference is utilized to convert the information bottleneck term into a KL divergence that is approximated as a constraint on the mean of the capsule. For supervised learning, class independent mask vector is used for understanding the types of variations synthetically irrespective of the image class, we carry out extensive quantitative and qualitative experiments by tuning the parameter $β$ to figure out the relationship between disentanglement, reconstruction and classfication performance. Furthermore, the unsupervised $β$-CapsNet and the corresponding dynamic routing algorithm is proposed for learning disentangled capsule in an unsupervised manner, extensive empirical evaluations suggest that our $β$-CapsNet achieves state-of-the-art disentanglement performance compared to CapsNet and various baselines on several complex datasets both in supervision and unsupervised scenes.
CVSep 9, 2022
Selecting Related Knowledge via Efficient Channel Attention for Online Continual LearningYa-nan Han, Jian-wei Liu
Continual learning aims to learn a sequence of tasks by leveraging the knowledge acquired in the past in an online-learning manner while being able to perform well on all previous tasks, this ability is crucial to the artificial intelligence (AI) system, hence continual learning is more suitable for most real-word and complex applicative scenarios compared to the traditional learning pattern. However, the current models usually learn a generic representation base on the class label on each task and an effective strategy is selected to avoid catastrophic forgetting. We postulate that selecting the related and useful parts only from the knowledge obtained to perform each task is more effective than utilizing the whole knowledge. Based on this fact, in this paper we propose a new framework, named Selecting Related Knowledge for Online Continual Learning (SRKOCL), which incorporates an additional efficient channel attention mechanism to pick the particular related knowledge for every task. Our model also combines experience replay and knowledge distillation to circumvent the catastrophic forgetting. Finally, extensive experiments are conducted on different benchmarks and the competitive experimental results demonstrate that our proposed SRKOCL is a promised approach against the state-of-the-art.
LGMar 28, 2022
Optimistic Online Convex Optimization in Dynamic EnvironmentsQing-xin Meng, Jian-wei Liu
In this paper, we study the optimistic online convex optimization problem in dynamic environments. Existing works have shown that Ader enjoys an $O\left(\sqrt{\left(1+P_T\right)T}\right)$ dynamic regret upper bound, where $T$ is the number of rounds, and $P_T$ is the path length of the reference strategy sequence. However, Ader is not environment-adaptive. Based on the fact that optimism provides a framework for implementing environment-adaptive, we replace Greedy Projection (GP) and Normalized Exponentiated Subgradient (NES) in Ader with Optimistic-GP and Optimistic-NES respectively, and name the corresponding algorithm ONES-OGP. We also extend the doubling trick to the adaptive trick, and introduce three characteristic terms naturally arise from optimism, namely $M_T$, $\widetilde{M}_T$ and $V_T+1_{L^2ρ\left(ρ+2 P_T\right)\leqslant\varrho^2 V_T}D_T$, to replace the dependence of the dynamic regret upper bound on $T$. We elaborate ONES-OGP with adaptive trick and its subgradient variation version, all of which are environment-adaptive.
NEOct 16, 2025
A Survey of Recursive and Recurrent Neural NetworksJian-wei Liu, Bing-rong Xu, Zhi-yan Song
In this paper, the branches of recursive and recurrent neural networks are classified in detail according to the network structure, training objective function and learning algorithm implementation. They are roughly divided into three categories: The first category is General Recursive and Recurrent Neural Networks, including Basic Recursive and Recurrent Neural Networks, Long Short Term Memory Recursive and Recurrent Neural Networks, Convolutional Recursive and Recurrent Neural Networks, Differential Recursive and Recurrent Neural Networks, One-Layer Recursive and Recurrent Neural Networks, High-Order Recursive and Recurrent Neural Networks, Highway Networks, Multidimensional Recursive and Recurrent Neural Networks, Bidirectional Recursive and Recurrent Neural Networks; the second category is Structured Recursive and Recurrent Neural Networks, including Grid Recursive and Recurrent Neural Networks, Graph Recursive and Recurrent Neural Networks, Temporal Recursive and Recurrent Neural Networks, Lattice Recursive and Recurrent Neural Networks, Hierarchical Recursive and Recurrent Neural Networks, Tree Recursive and Recurrent Neural Networks; the third category is Other Recursive and Recurrent Neural Networks, including Array Long Short Term Memory, Nested and Stacked Recursive and Recurrent Neural Networks, Memory Recursive and Recurrent Neural Networks. Various networks cross each other and even rely on each other to form a complex network of relationships. In the context of the development and convergence of various networks, many complex sequence, speech and image problems are solved. After a detailed description of the principle and structure of the above model and model deformation, the research progress and application of each model are described, and finally the recursive and recurrent neural network models are prospected and summarized.
LGSep 9, 2025
A Modular Algorithm for Non-Stationary Online Convex-Concave OptimizationQing-xin Meng, Xia Lei, Jian-wei Liu
This paper investigates the problem of Online Convex-Concave Optimization, which extends Online Convex Optimization to two-player time-varying convex-concave games. The goal is to minimize the dynamic duality gap (D-DGap), a critical performance measure that evaluates players' strategies against arbitrary comparator sequences. Existing algorithms fail to deliver optimal performance, particularly in stationary or predictable environments. To address this, we propose a novel modular algorithm with three core components: an Adaptive Module that dynamically adjusts to varying levels of non-stationarity, a Multi-Predictor Aggregator that identifies the best predictor among multiple candidates, and an Integration Module that effectively combines their strengths. Our algorithm achieves a minimax optimal D-DGap upper bound, up to a logarithmic factor, while also ensuring prediction error-driven D-DGap bounds. The modular design allows for the seamless replacement of components that regulate adaptability to dynamic environments, as well as the incorporation of components that integrate ``side knowledge'' from multiple predictors. Empirical results further demonstrate the effectiveness and adaptability of the proposed method.
LGDec 12, 2023
Online Saddle Point Problem and Online Convex-Concave OptimizationQing-xin Meng, Jian-wei Liu
Centered around solving the Online Saddle Point problem, this paper introduces the Online Convex-Concave Optimization (OCCO) framework, which involves a sequence of two-player time-varying convex-concave games. We propose the generalized duality gap (Dual-Gap) as the performance metric and establish the parallel relationship between OCCO with Dual-Gap and Online Convex Optimization (OCO) with regret. To demonstrate the natural extension of OCCO from OCO, we develop two algorithms, the implicit online mirror descent-ascent and its optimistic variant. Analysis reveals that their duality gaps share similar expression forms with the corresponding dynamic regrets arising from implicit updates in OCO. Empirical results further substantiate the effectiveness of our algorithms. Simultaneously, we unveil that the dynamic Nash equilibrium regret, which was initially introduced in a recent paper, has inherent defects.
LGJan 25, 2022
GMM Discriminant Analysis with Noisy Label for Each ClassJian-wei Liu, Zheng-ping Ren, Run-kun Lu et al.
Real world datasets often contain noisy labels, and learning from such datasets using standard classification approaches may not produce the desired performance. In this paper, we propose a Gaussian Mixture Discriminant Analysis (GMDA) with noisy label for each class. We introduce flipping probability and class probability and use EM algorithms to solve the discriminant problem with label noise. We also provide the detail proofs of convergence. Experimental results on synthetic and real-world datasets show that the proposed approach notably outperforms other four state-of-art methods.
LGJan 25, 2022
Bilevel Online Deep Learning in Non-stationary EnvironmentYa-nan Han, Jian-wei Liu, Bing-biao Xiao et al.
Recent years have witnessed enormous progress of online learning. However, a major challenge on the road to artificial agents is concept drift, that is, the data probability distribution would change where the data instance arrives sequentially in a stream fashion, which would lead to catastrophic forgetting and degrade the performance of the model. In this paper, we proposed a new Bilevel Online Deep Learning (BODL) framework, which combine bilevel optimization strategy and online ensemble classifier. In BODL algorithm, we use an ensemble classifier, which use the output of different hidden layers in deep neural network to build multiple base classifiers, the important weights of the base classifiers are updated according to exponential gradient descent method in an online manner. Besides, we apply the similar constraint to overcome the convergence problem of online ensemble framework. Then an effective concept drift detection mechanism utilizing the error rate of classifier is designed to monitor the change of the data probability distribution. When the concept drift is detected, our BODL algorithm can adaptively update the model parameters via bilevel optimization and then circumvent the large drift and encourage positive transfer. Finally, the extensive experiments and ablation studies are conducted on various datasets and the competitive numerical results illustrate that our BODL algorithm is a promising approach.
CVJan 24, 2022
Multi-Scale Iterative Refinement Network for RGB-D Salient Object DetectionZe-yu Liu, Jian-wei Liu, Xin Zuo et al.
The extensive research leveraging RGB-D information has been exploited in salient object detection. However, salient visual cues appear in various scales and resolutions of RGB images due to semantic gaps at different feature levels. Meanwhile, similar salient patterns are available in cross-modal depth images as well as multi-scale versions. Cross-modal fusion and multi-scale refinement are still an open problem in RGB-D salient object detection task. In this paper, we begin by introducing top-down and bottom-up iterative refinement architecture to leverage multi-scale features, and then devise attention based fusion module (ABF) to address on cross-modal correlation. We conduct extensive experiments on seven public datasets. The experimental results show the effectiveness of our devised method
LGJan 19, 2022
Online Deep Learning based on Auto-EncoderSi-si Zhang, Jian-wei Liu, Xin Zuo et al.
Online learning is an important technical means for sketching massive real-time and high-speed data. Although this direction has attracted intensive attention, most of the literature in this area ignore the following three issues: (1) they think little of the underlying abstract hierarchical latent information existing in examples, even if extracting these abstract hierarchical latent representations is useful to better predict the class labels of examples; (2) the idea of preassigned model on unseen datapoints is not suitable for modeling streaming data with evolving probability distribution. This challenge is referred as model flexibility. And so, with this in minds, the online deep learning model we need to design should have a variable underlying structure; (3) moreover, it is of utmost importance to fusion these abstract hierarchical latent representations to achieve better classification performance, and we should give different weights to different levels of implicit representation information when dealing with the data streaming where the data distribution changes. To address these issues, we propose a two-phase Online Deep Learning based on Auto-Encoder (ODLAE). Based on auto-encoder, considering reconstruction loss, we extract abstract hierarchical latent representations of instances; Based on predictive loss, we devise two fusion strategies: the output-level fusion strategy, which is obtained by fusing the classification results of encoder each hidden layer; and feature-level fusion strategy, which is leveraged self-attention mechanism to fusion every hidden layer output. Finally, in order to improve the robustness of the algorithm, we also try to utilize the denoising auto-encoder to yield hierarchical latent representations. Experimental results on different datasets are presented to verify the validity of our proposed algorithm (ODLAE) outperforms several baselines.
CVJan 15, 2022
Multi-View representation learning in Multi-Task SceneRun-kun Lu, Jian-wei Liu, Si-ming Lian et al.
Over recent decades have witnessed considerable progress in whether multi-task learning or multi-view learning, but the situation that consider both learning scenes simultaneously has received not too much attention. How to utilize multiple views latent representation of each single task to improve each learning task performance is a challenge problem. Based on this, we proposed a novel semi-supervised algorithm, termed as Multi-Task Multi-View learning based on Common and Special Features (MTMVCSF). In general, multi-views are the different aspects of an object and every view includes the underlying common or special information of this object. As a consequence, we will mine multiple views jointly latent factor of each learning task which consists of each view special feature and the common feature of all views. By this way, the original multi-task multi-view data has degenerated into multi-task data, and exploring the correlations among multiple tasks enables to make an improvement on the performance of learning algorithm. Another obvious advantage of this approach is that we get latent representation of the set of unlabeled instances by the constraint of regression task with labeled instances. The performance of classification and semi-supervised clustering task in these latent representations perform obviously better than it in raw data. Furthermore, an anti-noise multi-task multi-view algorithm called AN-MTMVCSF is proposed, which has a strong adaptability to noise labels. The effectiveness of these algorithms is proved by a series of well-designed experiments on both real world and synthetic data.
LGJan 11, 2022
Multi-granularity Relabeled Under-sampling Algorithm for Imbalanced DataQi Dai, Jian-wei Liu, Yang Liu
The imbalanced classification problem turns out to be one of the important and challenging problems in data mining and machine learning. The performances of traditional classifiers will be severely affected by many data problems, such as class imbalanced problem, class overlap and noise. The Tomek-Link algorithm was only used to clean data when it was proposed. In recent years, there have been reports of combining Tomek-Link algorithm with sampling technique. The Tomek-Link sampling algorithm can effectively reduce the class overlap on data, remove the majority instances that are difficult to distinguish, and improve the algorithm classification accuracy. However, the Tomek-Links under-sampling algorithm only considers the boundary instances that are the nearest neighbors to each other globally and ignores the potential local overlapping instances. When the number of minority instances is small, the under-sampling effect is not satisfactory, and the performance improvement of the classification model is not obvious. Therefore, on the basis of Tomek-Link, a multi-granularity relabeled under-sampling algorithm (MGRU) is proposed. This algorithm fully considers the local information of the data set in the local granularity subspace, and detects the local potential overlapping instances in the data set. Then, the overlapped majority instances are eliminated according to the global relabeled index value, which effectively expands the detection range of Tomek-Links. The simulation results show that when we select the optimal global relabeled index value for under-sampling, the classification accuracy and generalization performance of the proposed under-sampling algorithm are significantly better than other baseline algorithms.
LGJan 9, 2022
Auto-Encoder based Co-Training Multi-View Representation LearningRun-kun Lu, Jian-wei Liu, Yuan-fang Wang et al.
Multi-view learning is a learning problem that utilizes the various representations of an object to mine valuable knowledge and improve the performance of learning algorithm, and one of the significant directions of multi-view learning is sub-space learning. As we known, auto-encoder is a method of deep learning, which can learn the latent feature of raw data by reconstructing the input, and based on this, we propose a novel algorithm called Auto-encoder based Co-training Multi-View Learning (ACMVL), which utilizes both complementarity and consistency and finds a joint latent feature representation of multiple views. The algorithm has two stages, the first is to train auto-encoder of each view, and the second stage is to train a supervised network. Interestingly, the two stages share the weights partly and assist each other by co-training process. According to the experimental result, we can learn a well performed latent feature representation, and auto-encoder of each view has more powerful reconstruction ability than traditional auto-encoder.
LGJan 8, 2022
Multi-View Non-negative Matrix Factorization Discriminant Learning via Cross Entropy LossJian-wei Liu, Yuan-fang Wang, Run-kun Lu et al.
Multi-view learning accomplishes the task objectives of classification by leverag-ing the relationships between different views of the same object. Most existing methods usually focus on consistency and complementarity between multiple views. But not all of this information is useful for classification tasks. Instead, it is the specific discriminating information that plays an important role. Zhong Zhang et al. explore the discriminative and non-discriminative information exist-ing in common and view-specific parts among different views via joint non-negative matrix factorization. In this paper, we improve this algorithm on this ba-sis by using the cross entropy loss function to constrain the objective function better. At last, we implement better classification effect than original on the same data sets and show its superiority over many state-of-the-art algorithms.
LGJan 5, 2022
Adaptive Online Incremental Learning for Evolving Data StreamsSi-si Zhang, Jian-wei Liu, Xin Zuo
Recent years have witnessed growing interests in online incremental learning. However, there are three major challenges in this area. The first major difficulty is concept drift, that is, the probability distribution in the streaming data would change as the data arrives. The second major difficulty is catastrophic forgetting, that is, forgetting what we have learned before when learning new knowledge. The last one we often ignore is the learning of the latent representation. Only good latent representation can improve the prediction accuracy of the model. Our research builds on this observation and attempts to overcome these difficulties. To this end, we propose an Adaptive Online Incremental Learning for evolving data streams (AOIL). We use auto-encoder with the memory module, on the one hand, we obtained the latent features of the input, on the other hand, according to the reconstruction loss of the auto-encoder with memory module, we could successfully detect the existence of concept drift and trigger the update mechanism, adjust the model parameters in time. In addition, we divide features, which are derived from the activation of the hidden layers, into two parts, which are used to extract the common and private features respectively. By means of this approach, the model could learn the private features of the new coming instances, but do not forget what we have learned in the past (shared features), which reduces the occurrence of catastrophic forgetting. At the same time, to get the fusion feature vector we use the self-attention mechanism to effectively fuse the extracted features, which further improved the latent representation learning.
LGJan 1, 2022
Multi-view Subspace Adaptive Learning via Autoencoder and AttentionJian-wei Liu, Hao-jie Xie, Run-kun Lu et al.
Multi-view learning can cover all features of data samples more comprehensively, so multi-view learning has attracted widespread attention. Traditional subspace clustering methods, such as sparse subspace clustering (SSC) and low-ranking subspace clustering (LRSC), cluster the affinity matrix for a single view, thus ignoring the problem of fusion between views. In our article, we propose a new Multiview Subspace Adaptive Learning based on Attention and Autoencoder (MSALAA). This method combines a deep autoencoder and a method for aligning the self-representations of various views in Multi-view Low-Rank Sparse Subspace Clustering (MLRSSC), which can not only increase the capability to non-linearity fitting, but also can meets the principles of consistency and complementarity of multi-view learning. We empirically observe significant improvement over existing baseline methods on six real-life datasets.
LGJan 1, 2022
Self-attention Multi-view Representation Learning with Diversity-promoting ComplementarityJian-wei Liu, Xi-hao Ding, Run-kun Lu et al.
Multi-view learning attempts to generate a model with a better performance by exploiting the consensus and/or complementarity among multi-view data. However, in terms of complementarity, most existing approaches only can find representations with single complementarity rather than complementary information with diversity. In this paper, to utilize both complementarity and consistency simultaneously, give free rein to the potential of deep learning in grasping diversity-promoting complementarity for multi-view representation learning, we propose a novel supervised multi-view representation learning algorithm, called Self-Attention Multi-View network with Diversity-Promoting Complementarity (SAMVDPC), which exploits the consistency by a group of encoders, uses self-attention to find complementary information entailing diversity. Extensive experiments conducted on eight real-world datasets have demonstrated the effectiveness of our proposed method, and show its superiority over several baseline methods, which only consider single complementary information.
LGDec 29, 2021
Universal Transformer Hawkes Process with Adaptive Recursive IterationLu-ning Zhang, Jian-wei Liu, Zhi-yan Song et al.
Asynchronous events sequences are widely distributed in the natural world and human activities, such as earthquakes records, users activities in social media and so on. How to distill the information from these seemingly disorganized data is a persistent topic that researchers focus on. The one of the most useful model is the point process model, and on the basis, the researchers obtain many noticeable results. Moreover, in recent years, point process models on the foundation of neural networks, especially recurrent neural networks (RNN) are proposed and compare with the traditional models, their performance are greatly improved. Enlighten by transformer model, which can learning sequence data efficiently without recurrent and convolutional structure, transformer Hawkes process is come out, and achieves state-of-the-art performance. However, there is some research proving that the re-introduction of recursive calculations in transformer can further improve transformers performance. Thus, we come out with a new kind of transformer Hawkes process model, universal transformer Hawkes process (UTHP), which contains both recursive mechanism and self-attention mechanism, and to improve the local perception ability of the model, we also introduce convolutional neural network (CNN) in the position-wise-feed-forward part. We conduct experiments on several datasets to validate the effectiveness of UTHP and explore the changes after the introduction of the recursive mechanism. These experiments on multiple datasets demonstrate that the performance of our proposed new model has a certain improvement compared with the previous state-of-the-art models.
LGDec 29, 2021
Temporal Attention Augmented Transformer Hawkes ProcessLu-ning Zhang, Jian-wei Liu, Zhi-yan Song et al.
In recent years, mining the knowledge from asynchronous sequences by Hawkes process is a subject worthy of continued attention, and Hawkes processes based on the neural network have gradually become the most hotly researched fields, especially based on the recurrence neural network (RNN). However, these models still contain some inherent shortcomings of RNN, such as vanishing and exploding gradient and long-term dependency problems. Meanwhile, Transformer based on self-attention has achieved great success in sequential modeling like text processing and speech recognition. Although the Transformer Hawkes process (THP) has gained huge performance improvement, THPs do not effectively utilize the temporal information in the asynchronous events, for these asynchronous sequences, the event occurrence instants are as important as the types of events, while conventional THPs simply convert temporal information into position encoding and add them as the input of transformer. With this in mind, we come up with a new kind of Transformer-based Hawkes process model, Temporal Attention Augmented Transformer Hawkes Process (TAA-THP), we modify the traditional dot-product attention structure, and introduce the temporal encoding into attention structure. We conduct numerous experiments on a wide range of synthetic and real-life datasets to validate the performance of our proposed TAA-THP model, significantly improvement compared with existing baseline models on the different measurements is achieved, including log-likelihood on the test dataset, and prediction accuracies of event types and occurrence times. In addition, through the ablation studies, we vividly demonstrate the merit of introducing additional temporal attention by comparing the performance of the model with and without temporal attention.
LGDec 27, 2021
Survival Analysis of the Compressor Station Based on Hawkes Process with Weibull Base IntensityLu-ning Zhang, Jian-wei Liu, Xin Zuo
In this paper, we use the Hawkes process to model the sequence of failure, i.e., events of compressor station and conduct survival analysis on various failure events of the compressor station. However, until now, nearly all relevant literatures of the Hawkes point processes assume that the base intensity of the conditional intensity function is time-invariant. This assumption is apparently too harsh to be verified. For example, in the practical application, including financial analysis, reliability analysis, survival analysis and social network analysis, the base intensity of the truth conditional intensity function is very likely to be time-varying. The constant base intensity will not reflect the base probability of the failure occurring over time. Thus, in order to solve this problem, in this paper, we propose a new time-varying base intensity, for example, which is from Weibull distribution. First, we introduce the base intensity from the Weibull distribution, and then we propose an effective learning algorithm by maximum likelihood estimator. Experiments on the constant base intensity synthetic data, time-varying base intensity synthetic data, and real-world data show that our method can learn the triggering patterns of the Hawkes processes and the time-varying base intensity simultaneously and robustly. Experiments on the real-world data reveal the Granger causality of different kinds of failures and the base probability of failure varying over time.
LGDec 24, 2021
Tri-Transformer Hawkes Process: Three Heads are better than oneZhi-yan Song, Jian-wei Liu, Lu-ning Zhang et al.
Abstract. Most of the real world data we encounter are asynchronous event sequence, so the last decades have been characterized by the implementation of various point process into the field of social networks,electronic medical records and financial transactions. At the beginning, Hawkes process and its variants which can simulate simultaneously the self-triggering and mutual triggering patterns between different events in complex sequences in a clear and quantitative way are more popular.Later on, with the advances of neural network, neural Hawkes process has been proposed one after another, and gradually become a research hotspot. The proposal of the transformer Hawkes process (THP) has gained a huge performance improvement, so a new upsurge of the neural Hawkes process based on transformer is set off. However, THP does not make full use of the information of occurrence time and type of event in the asynchronous event sequence. It simply adds the encoding of event type conversion and the location encoding of time conversion to the source encoding. At the same time, the learner built from a single transformer will result in an inescapable learning bias. In order to mitigate these problems, we propose a tri-transformer Hawkes process (Tri-THP) model, in which the event and time information are added to the dot-product attention as auxiliary information to form a new multihead attention. The effectiveness of the Tri-THP is proved by a series of well-designed experiments on both real world and synthetic data.
CVDec 23, 2021
Attentive Multi-View Deep Subspace Clustering NetRun-kun Lu, Jian-wei Liu, Xin Zuo
In this paper, we propose a novel Attentive Multi-View Deep Subspace Nets (AMVDSN), which deeply explores underlying consistent and view-specific information from multiple views and fuse them by considering each view's dynamic contribution obtained by attention mechanism. Unlike most multi-view subspace learning methods that they directly reconstruct data points on raw data or only consider consistency or complementarity when learning representation in deep or shallow space, our proposed method seeks to find a joint latent representation that explicitly considers both consensus and view-specific information among multiple views, and then performs subspace clustering on learned joint latent representation.Besides, different views contribute differently to representation learning, we therefore introduce attention mechanism to derive dynamic weight for each view, which performs much better than previous fusion methods in the field of multi-view subspace clustering. The proposed algorithm is intuitive and can be easily optimized just by using Stochastic Gradient Descent (SGD) because of the neural network framework, which also provides strong non-linear characterization capability compared with traditional subspace clustering approaches. The experimental results on seven real-world data sets have demonstrated the effectiveness of our proposed algorithm against some state-of-the-art subspace learning approaches.
LGDec 22, 2021
A Unified Analysis Method for Online Optimization in Normed Vector SpaceQing-xin Meng, Jian-wei Liu
This paper studies online optimization from a high-level unified theoretical perspective. We not only generalize both Optimistic-DA and Optimistic-MD in normed vector space, but also unify their analysis methods for dynamic regret. Regret bounds are the tightest possible due to the introduction of $φ$-convex. As instantiations, regret bounds of normalized exponentiated subgradient and greedy/lazy projection are better than the currently known optimal results. By replacing losses of online game with monotone operators, and extending the definition of regret, namely regret$^n$, we extend online convex optimization to online monotone optimization.