Zhiguang Wang

LG
h-index8
20papers
6,570citations
Novelty49%
AI Score40

20 Papers

NEApr 28, 2022
Taylor Genetic Programming for Symbolic Regression

Baihe He, Qiang Lu, Qingyun Yang et al.

Genetic programming (GP) is a commonly used approach to solve symbolic regression (SR) problems. Compared with the machine learning or deep learning methods that depend on the pre-defined model and the training dataset for solving SR problems, GP is more focused on finding the solution in a search space. Although GP has good performance on large-scale benchmarks, it randomly transforms individuals to search results without taking advantage of the characteristics of the dataset. So, the search process of GP is usually slow, and the final results could be unstable.To guide GP by these characteristics, we propose a new method for SR, called Taylor genetic programming (TaylorGP) (Code and appendix at https://kgae-cup.github.io/TaylorGP/). TaylorGP leverages a Taylor polynomial to approximate the symbolic equation that fits the dataset. It also utilizes the Taylor polynomial to extract the features of the symbolic equation: low order polynomial discrimination, variable separability, boundary, monotonic, and parity. GP is enhanced by these Taylor polynomial techniques. Experiments are conducted on three kinds of benchmarks: classical SR, machine learning, and physics. The experimental results show that TaylorGP not only has higher accuracy than the nine baseline methods, but also is faster in finding stable results.

LGApr 22, 2022
Exploring Hidden Semantics in Neural Networks with Symbolic Regression

Yuanzhen Luo, Qiang Lu, Xilei Hu et al.

Many recent studies focus on developing mechanisms to explain the black-box behaviors of neural networks (NNs). However, little work has been done to extract the potential hidden semantics (mathematical representation) of a neural network. A succinct and explicit mathematical representation of a NN model could improve the understanding and interpretation of its behaviors. To address this need, we propose a novel symbolic regression method for neural works (called SRNet) to discover the mathematical expressions of a NN. SRNet creates a Cartesian genetic programming (NNCGP) to represent the hidden semantics of a single layer in a NN. It then leverages a multi-chromosome NNCGP to represent hidden semantics of all layers of the NN. The method uses a (1+$λ$) evolutionary strategy (called MNNCGP-ES) to extract the final mathematical expressions of all layers in the NN. Experiments on 12 symbolic regression benchmarks and 5 classification benchmarks show that SRNet not only can reveal the complex relationships between each layer of a NN but also can extract the mathematical representation of the whole NN. Compared with LIME and MAPLE, SRNet has higher interpolation accuracy and trends to approximate the real model on the practical dataset.

LGJul 20, 2025Code
Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback

Yiyuan Yang, Zichuan Liu, Lei Song et al.

Time series anomaly detection is critical across various domains, yet current approaches often limit analysis to mere binary anomaly classification without detailed categorization or further explanatory reasoning. To address these limitations, we propose a novel task, Time-series Reasoning for Anomaly (Time-RA) that transforms classical time series anomaly detection from a discriminative into a generative, reasoning-intensive task leveraging Large Language Models (LLMs). Also, we introduce the first real-world multimodal benchmark dataset, RATs40K, explicitly annotated for anomaly reasoning, comprising approximately 40,000 samples across 10 real-world domains. Each sample includes numeric time series data, contextual text information, and visual representations, each annotated with fine-grained categories (14 types for univariate anomalies and 6 for multivariate anomalies) and structured explanatory reasoning. We develop a sophisticated annotation framework utilizing ensemble-generated labels refined through GPT-4-driven feedback, ensuring accuracy and interpretability. Extensive benchmarking of LLMs and multimodal LLMs demonstrates the capabilities and limitations of current models, highlighting the critical role of supervised fine-tuning. Our dataset and task pave the way for significant advancements in interpretable time series anomaly detection and reasoning. The code (https://github.com/yyysjz1997/Time-RA) and dataset (https://huggingface.co/datasets/Time-RA/RATs40K) have been fully open-sourced to support and accelerate future research in this area.

CLSep 10, 2021
Zero-Shot Dialogue State Tracking via Cross-Task Transfer

Zhaojiang Lin, Bing Liu, Andrea Madotto et al.

Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data. In this work, we propose to transfer the \textit{cross-task} knowledge from general question answering (QA) corpora for the zero-shot DST task. Specifically, we propose TransferQA, a transferable generative QA model that seamlessly combines extractive QA and multi-choice QA via a text-to-text transformer framework, and tracks both categorical slots and non-categorical slots in DST. In addition, we introduce two effective ways to construct unanswerable questions, namely, negative question sampling and context truncation, which enable our model to handle "none" value slots in the zero-shot DST setting. The extensive experiments show that our approaches substantially improve the existing zero-shot and few-shot results on MultiWoz. Moreover, compared to the fully trained baseline on the Schema-Guided Dialogue dataset, our approach shows better generalization ability in unseen domains.

CLMay 10, 2021
Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue State Tracking

Zhaojiang Lin, Bing Liu, Seungwhan Moon et al.

Zero-shot cross-domain dialogue state tracking (DST) enables us to handle task-oriented dialogue in unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot description enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes dialogue context and slots with a pre-trained self-attentive encoder, and generates slot values in an auto-regressive manner. In addition, we incorporate Slot Type Informed Descriptions that capture the shared information across slots to facilitate cross-domain knowledge transfer. Experimental results on the MultiWOZ dataset show that our proposed method significantly improves existing state-of-the-art results in the zero-shot cross-domain setting.

LGMar 26, 2021
Gated Transformer Networks for Multivariate Time Series Classification

Minghao Liu, Shengqi Ren, Siyuan Ma et al.

Deep learning model (primarily convolutional networks and LSTM) for time series classification has been studied broadly by the community with the wide applications in different domains like healthcare, finance, industrial engineering and IoT. Meanwhile, Transformer Networks recently achieved frontier performance on various natural language processing and computer vision tasks. In this work, we explored a simple extension of the current Transformer Networks with gating, named Gated Transformer Networks (GTN) for the multivariate time series classification problem. With the gating that merges two towers of Transformer which model the channel-wise and step-wise correlations respectively, we show how GTN is naturally and effectively suitable for the multivariate time series classification task. We conduct comprehensive experiments on thirteen dataset with full ablation study. Our results show that GTN is able to achieve competing results with current state-of-the-art deep learning models. We also explored the attention map for the natural interpretability of GTN on time series modeling. Our preliminary results provide a strong baseline for the Transformer Networks on multivariate time series classification task and grounds the foundation for future research.

CLDec 31, 2020
Continual Learning in Task-Oriented Dialogue Systems

Andrea Madotto, Zhaojiang Lin, Zhenpeng Zhou et al.

Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining. In this paper, we propose a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language generation, and end-to-end. Moreover, we implement and compare multiple existing continual learning baselines, and we propose a simple yet effective architectural method based on residual adapters. Our experiments demonstrate that the proposed architectural method and a simple replay-based strategy perform comparably well but they both achieve inferior performance to the multi-task learning baseline, in where all the data are shown at once, showing that continual learning in task-oriented dialogue systems is a challenging task. Furthermore, we reveal several trade-offs between different continual learning methods in term of parameter usage and memory size, which are important in the design of a task-oriented dialogue system. The proposed benchmark is released together with several baselines to promote more research in this direction.

CLOct 24, 2020
Adding Chit-Chat to Enhance Task-Oriented Dialogues

Kai Sun, Seungwhan Moon, Paul Crook et al.

Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e.g., booking hotels), open-domain chatbots aim at making socially engaging conversations. In this work, we propose to integrate both types of systems by Adding Chit-Chat to ENhance Task-ORiented dialogues (ACCENTOR), with the goal of making virtual assistant conversations more engaging and interactive. Specifically, we propose a Human <-> AI collaborative data collection approach for generating diverse chit-chat responses to augment task-oriented dialogues with minimal annotation effort. We then present our new chit-chat-based annotations to 23.8K dialogues from two popular task-oriented datasets (Schema-Guided Dialogue and MultiWOZ 2.1) and demonstrate their advantage over the originals via human evaluation. Lastly, we propose three new models for adding chit-chat to task-oriented dialogues, explicitly trained to predict user goals and to generate contextually relevant chit-chat responses. Automatic and human evaluations show that, compared with the state-of-the-art task-oriented baseline, our models can code-switch between task and chit-chat to be more engaging, interesting, knowledgeable, and humanlike, while maintaining competitive task performance.

CLMay 1, 2020
Information Seeking in the Spirit of Learning: a Dataset for Conversational Curiosity

Pedro Rodriguez, Paul Crook, Seungwhan Moon et al.

Open-ended human learning and information-seeking are increasingly mediated by digital assistants. However, such systems often ignore the user's pre-existing knowledge. Assuming a correlation between engagement and user responses such as "liking" messages or asking followup questions, we design a Wizard-of-Oz dialog task that tests the hypothesis that engagement increases when users are presented with facts related to what they know. Through crowd-sourcing of this experiment, we collect and release 14K dialogs (181K utterances) where users and assistants converse about geographic topics like geopolitical entities and locations. This dataset is annotated with pre-existing user knowledge, message-level dialog acts, grounding to Wikipedia, and user reactions to messages. Responses using a user's prior knowledge increase engagement. We incorporate this knowledge into a multi-task model that reproduces human assistant policies and improves over a BERT content model by 13 mean reciprocal rank points.

LGApr 24, 2018
Improving Native Ads CTR Prediction by Large Scale Event Embedding and Recurrent Networks

Mehul Parsana, Krishna Poola, Yajun Wang et al.

Click through rate (CTR) prediction is very important for Native advertisement but also hard as there is no direct query intent. In this paper we propose a large-scale event embedding scheme to encode the each user browsing event by training a Siamese network with weak supervision on the users' consecutive events. The CTR prediction problem is modeled as a supervised recurrent neural network, which naturally model the user history as a sequence of events. Our proposed recurrent models utilizing pretrained event embedding vectors and an attention layer to model the user history. Our experiments demonstrate that our model significantly outperforms the baseline and some variants.

DCSep 13, 2017
Automated Cloud Provisioning on AWS using Deep Reinforcement Learning

Zhiguang Wang, Chul Gwon, Tim Oates et al.

As the use of cloud computing continues to rise, controlling cost becomes increasingly important. Yet there is evidence that 30\% - 45\% of cloud spend is wasted. Existing tools for cloud provisioning typically rely on highly trained human experts to specify what to monitor, thresholds for triggering action, and actions. In this paper we explore the use of reinforcement learning (RL) to acquire policies to balance performance and spend, allowing humans to specify what they want as opposed to how to do it, minimizing the need for cloud expertise. Empirical results with tabular, deep, and dueling double deep Q-learning with the CloudSim simulator show the utility of RL and the relative merits of the approaches. We also demonstrate effective policy transfer learning from an extremely simple simulator to CloudSim, with the next step being transfer from CloudSim to an Amazon Web Services physical environment.

CVMar 31, 2017
Diabetic Retinopathy Detection via Deep Convolutional Networks for Discriminative Localization and Visual Explanation

Zhiguang Wang, Jianbo Yang

We proposed a deep learning method for interpretable diabetic retinopathy (DR) detection. The visual-interpretable feature of the proposed method is achieved by adding the regression activation map (RAM) after the global averaging pooling layer of the convolutional networks (CNN). With RAM, the proposed model can localize the discriminative regions of an retina image to show the specific region of interest in terms of its severity level. We believe this advantage of the proposed deep learning model is highly desired for DR detection because in practice, users are not only interested with high prediction performance, but also keen to understand the insights of DR detection and why the adopted learning model works. In the experiments conducted on a large scale of retina image dataset, we show that the proposed CNN model can achieve high performance on DR detection compared with the state-of-the-art while achieving the merits of providing the RAM to highlight the salient regions of the input image.

LGNov 20, 2016
Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline

Zhiguang Wang, Weizhong Yan, Tim Oates

We propose a simple but strong baseline for time series classification from scratch with deep neural networks. Our proposed baseline models are pure end-to-end without any heavy preprocessing on the raw data or feature crafting. The proposed Fully Convolutional Network (FCN) achieves premium performance to other state-of-the-art approaches and our exploration of the very deep neural networks with the ResNet structure is also competitive. The global average pooling in our convolutional model enables the exploitation of the Class Activation Map (CAM) to find out the contributing region in the raw data for the specific labels. Our models provides a simple choice for the real world application and a good starting point for the future research. An overall analysis is provided to discuss the generalization capability of our models, learned features, network structures and the classification semantics.

LGOct 24, 2016
Encoding Temporal Markov Dynamics in Graph for Visualizing and Mining Time Series

Lu Liu, Zhiguang Wang

Time series and signals are attracting more attention across statistics, machine learning and pattern recognition as it appears widely in the industry especially in sensor and IoT related research and applications, but few advances has been achieved in effective time series visual analytics and interaction due to its temporal dimensionality and complex dynamics. Inspired by recent effort on using network metrics to characterize time series for classification, we present an approach to visualize time series as complex networks based on the first order Markov process in its temporal ordering. In contrast to the classical bar charts, line plots and other statistics based graph, our approach delivers more intuitive visualization that better preserves both the temporal dependency and frequency structures. It provides a natural inverse operation to map the graph back to raw signals, making it possible to use graph statistics to characterize time series for better visual exploration and statistical analysis. Our experimental results suggest the effectiveness on various tasks such as pattern discovery and classification on both synthetic and the real time series and sensor data.

LGOct 24, 2016
Representation Learning with Deconvolution for Multivariate Time Series Classification and Visualization

Zhiguang Wang, Wei Song, Lu Liu et al.

We propose a new model based on the deconvolutional networks and SAX discretization to learn the representation for multivariate time series. Deconvolutional networks fully exploit the advantage the powerful expressiveness of deep neural networks in the manner of unsupervised learning. We design a network structure specifically to capture the cross-channel correlation with deconvolution, forcing the pooling operation to perform the dimension reduction along each position in the individual channel. Discretization based on Symbolic Aggregate Approximation is applied on the feature vectors to further extract the bag of features. We show how this representation and bag of features helps on classification. A full comparison with the sequence distance based approach is provided to demonstrate the effectiveness of our approach on the standard datasets. We further build the Markov matrix from the discretized representation from the deconvolution to visualize the time series as complex networks, which show more class-specific statistical properties and clear structures with respect to different labels.

LGOct 13, 2015
Adopting Robustness and Optimality in Fitting and Learning

Zhiguang Wang, Tim Oates, James Lo

We generalized a modified exponentialized estimator by pushing the robust-optimal (RO) index $λ$ to $-\infty$ for achieving robustness to outliers by optimizing a quasi-Minimin function. The robustness is realized and controlled adaptively by the RO index without any predefined threshold. Optimality is guaranteed by expansion of the convexity region in the Hessian matrix to largely avoid local optima. Detailed quantitative analysis on both robustness and optimality are provided. The results of proposed experiments on fitting tasks for three noisy non-convex functions and the digits recognition task on the MNIST dataset consolidate the conclusions.

LGSep 24, 2015
Spatially Encoding Temporal Correlations to Classify Temporal Data Using Convolutional Neural Networks

Zhiguang Wang, Tim Oates

We propose an off-line approach to explicitly encode temporal patterns spatially as different types of images, namely, Gramian Angular Fields and Markov Transition Fields. This enables the use of techniques from computer vision for feature learning and classification. We used Tiled Convolutional Neural Networks to learn high-level features from individual GAF, MTF, and GAF-MTF images on 12 benchmark time series datasets and two real spatial-temporal trajectory datasets. The classification results of our approach are competitive with state-of-the-art approaches on both types of data. An analysis of the features and weights learned by the CNNs explains why the approach works.

LGJun 8, 2015
Empirical Studies on Symbolic Aggregation Approximation Under Statistical Perspectives for Knowledge Discovery in Time Series

Wei Song, Zhiguang Wang, Yangdong Ye et al.

Symbolic Aggregation approXimation (SAX) has been the de facto standard representation methods for knowledge discovery in time series on a number of tasks and applications. So far, very little work has been done in empirically investigating the intrinsic properties and statistical mechanics in SAX words. In this paper, we applied several statistical measurements and proposed a new statistical measurement, i.e. information embedding cost (IEC) to analyze the statistical behaviors of the symbolic dynamics. Our experiments on the benchmark datasets and the clinical signals demonstrate that SAX can always reduce the complexity while preserving the core information embedded in the original time series with significant embedding efficiency. Our proposed IEC score provide a priori to determine if SAX is adequate for specific dataset, which can be generalized to evaluate other symbolic representations. Our work provides an analytical framework with several statistical tools to analyze, evaluate and further improve the symbolic dynamics for knowledge discovery in time series.

LGJun 8, 2015
Adaptive Normalized Risk-Averting Training For Deep Neural Networks

Zhiguang Wang, Tim Oates, James Lo

This paper proposes a set of new error criteria and learning approaches, Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex optimization problem in training deep neural networks (DNNs). Theoretically, we demonstrate its effectiveness on global and local convexity lower-bounded by the standard $L_p$-norm error. By analyzing the gradient on the convexity index $λ$, we explain the reason why to learn $λ$ adaptively using gradient descent works. In practice, we show how this method improves training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard ConvNets + MSE/cross entropy. Performance on deep/shallow multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can be combined with other quasi-Newton training methods, innovative network variants, regularization techniques and other specific tricks in DNNs. Other than unsupervised pretraining, it provides a new perspective to address the non-convex optimization problem in DNNs.

LGJun 1, 2015
Imaging Time-Series to Improve Classification and Imputation

Zhiguang Wang, Tim Oates

Inspired by recent successes of deep learning in computer vision, we propose a novel framework for encoding time series as different types of images, namely, Gramian Angular Summation/Difference Fields (GASF/GADF) and Markov Transition Fields (MTF). This enables the use of techniques from computer vision for time series classification and imputation. We used Tiled Convolutional Neural Networks (tiled CNNs) on 20 standard datasets to learn high-level features from the individual and compound GASF-GADF-MTF images. Our approaches achieve highly competitive results when compared to nine of the current best time series classification approaches. Inspired by the bijection property of GASF on 0/1 rescaled data, we train Denoised Auto-encoders (DA) on the GASF images of four standard and one synthesized compound dataset. The imputation MSE on test data is reduced by 12.18%-48.02% when compared to using the raw data. An analysis of the features and weights learned via tiled CNNs and DAs explains why the approaches work.