Xue Wang

h-index22

11papers

4,130citations

Novelty59%

AI Score35

Ranked #103,045 of 194,257 authors (top 53%)#34,569 in CV (top 58%)

11 Papers

52.6LGFeb 23, 2023Code

One Fits All:Power General Time Series Analysis by Pretrained LM

Tian Zhou, PeiSong Niu, Xue Wang et al.

Although we have witnessed great success of pre-trained models in natural language processing (NLP) and computer vision (CV), limited progress has been made for general time series analysis. Unlike NLP and CV where a unified model can be used to perform different tasks, specially designed approach still dominates in each time series analysis task such as classification, anomaly detection, forecasting, and few-shot learning. The main challenge that blocks the development of pre-trained model for time series analysis is the lack of a large amount of data for training. In this work, we address this challenge by leveraging language or CV models, pre-trained from billions of tokens, for time series analysis. Specifically, we refrain from altering the self-attention and feedforward layers of the residual blocks in the pre-trained language or image model. This model, known as the Frozen Pretrained Transformer (FPT), is evaluated through fine-tuning on all major types of tasks involving time series. Our results demonstrate that pre-trained models on natural language or images can lead to a comparable or state-of-the-art performance in all main time series analysis tasks, as illustrated in Figure 1. We also found both theoretically and empirically that the self-attention module behaviors similarly to principle component analysis (PCA), an observation that helps explains how transformer bridges the domain gap and a crucial step towards understanding the universality of a pre-trained transformer.The code is publicly available at https://github.com/DAMO-DI-ML/One_Fits_All.

6.9LGJun 24, 2022

TreeDRNet:A Robust Deep Model for Long Term Time Series Forecasting

Tian Zhou, Jianqing Zhu, Xue Wang et al.

Various deep learning models, especially some latest Transformer-based approaches, have greatly improved the state-of-art performance for long-term time series forecasting.However, those transformer-based models suffer a severe deterioration performance with prolonged input length, which prohibits them from using extended historical info.Moreover, these methods tend to handle complex examples in long-term forecasting with increased model complexity, which often leads to a significant increase in computation and less robustness in performance(e.g., overfitting). We propose a novel neural network architecture, called TreeDRNet, for more effective long-term forecasting. Inspired by robust regression, we introduce doubly residual link structure to make prediction more robust.Built upon Kolmogorov-Arnold representation theorem, we explicitly introduce feature selection, model ensemble, and a tree structure to further utilize the extended input sequence, which improves the robustness and representation power of TreeDRNet. Unlike previous deep models for sequential forecasting work, TreeDRNet is built entirely on multilayer perceptron and thus enjoys high computational efficiency. Our extensive empirical studies show that TreeDRNet is significantly more effective than state-of-the-art methods, reducing prediction errors by 20% to 40% for multivariate time series. In particular, TreeDRNet is over 10 times more efficient than transformer-based methods. The code will be released soon.

2.7IVApr 1, 2022

Epipolar Focus Spectrum: A Novel Light Field Representation and Application in Dense-view Reconstruction

Yaning Li, Xue Wang, Hao Zhu et al.

Existing light field representations, such as epipolar plane image (EPI) and sub-aperture images, do not consider the structural characteristics across the views, so they usually require additional disparity and spatial structure cues for follow-up tasks. Besides, they have difficulties dealing with occlusions or larger disparity scenes. To this end, this paper proposes a novel Epipolar Focus Spectrum (EFS) representation by rearranging the EPI spectrum. Different from the classical EPI representation where an EPI line corresponds to a specific depth, there is a one-to-one mapping from the EFS line to the view. Accordingly, compared to a sparsely-sampled light field, a densely-sampled one with the same field of view (FoV) leads to a more compact distribution of such linear structures in the double-cone-shaped region with the identical opening angle in its corresponding EFS. Hence the EFS representation is invariant to the scene depth. To demonstrate its effectiveness, we develop a trainable EFS-based pipeline for light field reconstruction, where a dense light field can be reconstructed by compensating the "missing EFS lines" given a sparse light field, yielding promising results with cross-view consistency, especially in the presence of severe occlusion and large disparity. Experimental results on both synthetic and real-world datasets demonstrate the validity and superiority of the proposed method over SOTA methods.

1.4CVApr 1, 2022

Stereo Unstructured Magnification: Multiple Homography Image for View Synthesis

Qi Zhang, Xin Huang, Ying Feng et al.

This paper studies the problem of view synthesis with certain amount of rotations from a pair of images, what we called stereo unstructured magnification. While the multi-plane image representation is well suited for view synthesis with depth invariant, how to generalize it to unstructured views remains a significant challenge. This is primarily due to the depth-dependency caused by camera frontal parallel representation. Here we propose a novel multiple homography image (MHI) representation, comprising of a set of scene planes with fixed normals and distances. A two-stage network is developed for novel view synthesis. Stage-1 is an MHI reconstruction module that predicts the MHIs and composites layered multi-normal images along the normal direction. Stage-2 is a normal-blending module to find blending weights. We also derive an angle-based cost to guide the blending of multi-normal images by exploiting per-normal geometry. Compared with the state-of-the-art methods, our method achieves superior performance for view synthesis qualitatively and quantitatively, especially for cases when the cameras undergo rotations.

9.4CVNov 1, 2022

Siamese Transition Masked Autoencoders as Uniform Unsupervised Visual Anomaly Detector

Haiming Yao, Xue Wang, Wenyong Yu

Unsupervised visual anomaly detection conveys practical significance in many scenarios and is a challenging task due to the unbounded definition of anomalies. Moreover, most previous methods are application-specific, and establishing a unified model for anomalies across application scenarios remains unsolved. This paper proposes a novel hybrid framework termed Siamese Transition Masked Autoencoders(ST-MAE) to handle various visual anomaly detection tasks uniformly via deep feature transition. Concretely, the proposed method first extracts hierarchical semantics features from a pre-trained deep convolutional neural network and then develops a feature decoupling strategy to split the deep features into two disjoint feature patch subsets. Leveraging the decoupled features, the ST-MAE is developed with the Siamese encoders that operate on each subset of feature patches and perform the latent representations transition of two subsets, along with a lightweight decoder that reconstructs the original feature from the transitioned latent representation. Finally, the anomalous attributes can be detected using the semantic deep feature residual. Our deep feature transition scheme yields a nontrivial and semantic self-supervisory task to extract prototypical normal patterns, which allows for learning uniform models that generalize well for different visual anomaly detection tasks. The extensive experiments conducted demonstrate that the proposed ST-MAE method can advance state-of-the-art performance on multiple benchmarks across application scenarios with a superior inference efficiency, which exhibits great potential to be the uniform model for unsupervised visual anomaly detection.

6.6LGNov 24, 2023Code

Understanding the Role of Textual Prompts in LLM for Time Series Forecasting: an Adapter View

Peisong Niu, Tian Zhou, Xue Wang et al.

In the burgeoning domain of Large Language Models (LLMs), there is a growing interest in applying LLM to time series forecasting, with multiple studies focused on leveraging textual prompts to further enhance the predictive prowess. This study aims to understand how and why the integration of textual prompts into LLM can effectively improve the prediction accuracy of time series, which is not obvious at the glance, given the significant domain gap between texts and time series. Our extensive examination leads us to believe that (a) adding text prompts is roughly equivalent to introducing additional adapters, and (b) It is the introduction of learnable parameters rather than textual information that aligns the LLM with the time series forecasting task, ultimately enhancing prediction accuracy. Inspired by this discovery, we developed four adapters that explicitly address the gap between LLM and time series, and further improve the prediction accuracy. Overall,our work highlights how textual prompts enhance LLM accuracy in time series forecasting and suggests new avenues for continually improving LLM-based time series analysis.

54.4LGJan 30, 2022Code

FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

Tian Zhou, Ziqing Ma, Qingsong Wen et al.

Although Transformer-based methods have significantly improved state-of-the-art results for long-term series forecasting, they are not only computationally expensive but more importantly, are unable to capture the global view of time series (e.g. overall trend). To address these problems, we propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. To further enhance the performance of Transformer for long-term prediction, we exploit the fact that most time series tend to have a sparse representation in well-known basis such as Fourier transform, and develop a frequency enhanced Transformer. Besides being more effective, the proposed method, termed as Frequency Enhanced Decomposed Transformer ({\bf FEDformer}), is more efficient than standard Transformer with a linear complexity to the sequence length. Our empirical studies with six benchmark datasets show that compared with state-of-the-art methods, FEDformer can reduce prediction error by $14.8\%$ and $22.6\%$ for multivariate and univariate time series, respectively. Code is publicly available at https://github.com/MAZiqing/FEDformer.

10.1CVFeb 13, 2022

Progressive Backdoor Erasing via connecting Backdoor and Adversarial Attacks

Bingxu Mu, Zhenxing Niu, Le Wang et al.

Deep neural networks (DNNs) are known to be vulnerable to both backdoor attacks as well as adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, in this paper we find an intriguing connection between them: for a model planted with backdoors, we observe that its adversarial examples have similar behaviors as its triggered images, i.e., both activate the same subset of DNN neurons. It indicates that planting a backdoor into a model will significantly affect the model's adversarial examples. Based on these observations, a novel Progressive Backdoor Erasing (PBE) algorithm is proposed to progressively purify the infected model by leveraging untargeted adversarial attacks. Different from previous backdoor defense methods, one significant advantage of our approach is that it can erase backdoor even when the clean extra dataset is unavailable. We empirically show that, against 5 state-of-the-art backdoor attacks, our PBE can effectively erase the backdoor without obvious performance degradation on clean samples and significantly outperforms existing defense methods.

4.7CVDec 2, 2021

SIDNet: Learning Shading-aware Illumination Descriptor for Image Harmonization

Zhongyun Hu, Ntumba Elie Nsampi, Xue Wang et al.

Image harmonization aims at adjusting the appearance of the foreground to make it more compatible with the background. Without exploring background illumination and its effects on the foreground elements, existing works are incapable of generating a realistic foreground shading. In this paper, we decompose the image harmonization task into two sub-problems: 1) illumination estimation of the background image and 2) re-rendering of foreground objects under background illumination. Before solving these two sub-problems, we first learn a shading-aware illumination descriptor via a well-designed neural rendering framework, of which the key is a shading bases module that generates multiple shading bases from the foreground image. Then we design a background illumination estimation module to extract the illumination descriptor from the background. Finally, the Shading-aware Illumination Descriptor is used in conjunction with the neural rendering framework (SIDNet) to produce the harmonized foreground image containing a novel harmonized shading. Moreover, we construct a photo-realistic synthetic image harmonization dataset that contains numerous shading variations with image-based lighting. Extensive experiments on both synthetic and real data demonstrate the superiority of the proposed method, especially in dealing with foreground shadings.

16.6CVSep 8, 2021

Scaled ReLU Matters for Training Vision Transformers

Pichao Wang, Xue Wang, Hao Luo et al.

Vision transformers (ViTs) have been an alternative design paradigm to convolutional neural networks (CNNs). However, the training of ViTs is much harder than CNNs, as it is sensitive to the training parameters, such as learning rate, optimizer and warmup epoch. The reasons for training difficulty are empirically analysed in ~\cite{xiao2021early}, and the authors conjecture that the issue lies with the \textit{patchify-stem} of ViT models and propose that early convolutions help transformers see better. In this paper, we further investigate this problem and extend the above conclusion: only early convolutions do not help for stable training, but the scaled ReLU operation in the \textit{convolutional stem} (\textit{conv-stem}) matters. We verify, both theoretically and empirically, that scaled ReLU in \textit{conv-stem} not only improves training stabilization, but also increases the diversity of patch tokens, thus boosting peak performance with a large margin via adding few parameters and flops. In addition, extensive experiments are conducted to demonstrate that previous ViTs are far from being well trained, further showing that ViTs have great potential to be a better substitute of CNNs.

0.9CVDec 13, 2017

Real-time Egocentric Gesture Recognition on Mobile Head Mounted Displays

Rohit Pandey, Marie White, Pavel Pidlypenskyi et al.

Mobile virtual reality (VR) head mounted displays (HMD) have become popular among consumers in recent years. In this work, we demonstrate real-time egocentric hand gesture detection and localization on mobile HMDs. Our main contributions are: 1) A novel mixed-reality data collection tool to automatic annotate bounding boxes and gesture labels; 2) The largest-to-date egocentric hand gesture and bounding box dataset with more than 400,000 annotated frames; 3) A neural network that runs real time on modern mobile CPUs, and achieves higher than 76% precision on gesture recognition across 8 classes.