Fan-Keng Sun

LG
h-index4
10papers
1,273citations
Novelty57%
AI Score37

10 Papers

LGJul 22, 2022Code
Learning from Multiple Annotator Noisy Labels via Sample-wise Label Fusion

Zhengqi Gao, Fan-Keng Sun, Mingran Yang et al. · mit

Data lies at the core of modern deep learning. The impressive performance of supervised learning is built upon a base of massive accurately labeled data. However, in some real-world applications, accurate labeling might not be viable; instead, multiple noisy labels (instead of one accurate label) are provided by several annotators for each data sample. Learning a classifier on such a noisy training dataset is a challenging task. Previous approaches usually assume that all data samples share the same set of parameters related to annotator errors, while we demonstrate that label error learning should be both annotator and data sample dependent. Motivated by this observation, we propose a novel learning algorithm. The proposed method displays superiority compared with several state-of-the-art baseline methods on MNIST, CIFAR-100, and ImageNet-100. Our code is available at: https://github.com/zhengqigao/Learning-from-Multiple-Annotator-Noisy-Labels.

LGMay 24, 2022
FreDo: Frequency Domain-based Long-Term Time Series Forecasting

Fan-Keng Sun, Duane S. Boning · mit

The ability to forecast far into the future is highly beneficial to many applications, including but not limited to climatology, energy consumption, and logistics. However, due to noise or measurement error, it is questionable how far into the future one can reasonably predict. In this paper, we first mathematically show that due to error accumulation, sophisticated models might not outperform baseline models for long-term forecasting. To demonstrate, we show that a non-parametric baseline model based on periodicity can actually achieve comparable performance to a state-of-the-art Transformer-based model on various datasets. We further propose FreDo, a frequency domain-based neural network model that is built on top of the baseline model to enhance its performance and which greatly outperforms the state-of-the-art model. Finally, we validate that the frequency domain is indeed better by comparing univariate models trained in the frequency v.s. time domain.

LGOct 24, 2023
Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction

Chih-Yu Lai, Fan-Keng Sun, Zhengqi Gao et al.

Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based reconstruction models. The point-based model attempts to quantify point anomalies, and the sequence-based model attempts to quantify both point and contextual anomalies. Under the formulation that the observed time point is a two-stage deviated value from a nominal time point, we introduce a nominality score calculated from the ratio of a combined value of the reconstruction errors. We derive an induced anomaly score by further integrating the nominality score and anomaly score, then theoretically prove the superiority of the induced anomaly score over the original anomaly score under certain conditions. Extensive studies conducted on several public datasets show that the proposed framework outperforms most state-of-the-art baselines for time series anomaly detection.

LGOct 24, 2023
KirchhoffNet: A Scalable Ultra Fast Analog Neural Network

Zhengqi Gao, Fan-Keng Sun, Ron Rohrer et al.

In this paper, we leverage a foundational principle of analog electronic circuitry, Kirchhoff's current and voltage laws, to introduce a distinctive class of neural network models termed KirchhoffNet. Essentially, KirchhoffNet is an analog circuit that can function as a neural network, utilizing its initial node voltages as the neural network input and the node voltages at a specific time point as the output. The evolution of node voltages within the specified time is dictated by learnable parameters on the edges connecting nodes. We demonstrate that KirchhoffNet is governed by a set of ordinary differential equations (ODEs), and notably, even in the absence of traditional layers (such as convolution layers), it attains state-of-the-art performances across diverse and complex machine learning tasks. Most importantly, KirchhoffNet can be potentially implemented as a low-power analog integrated circuit, leading to an appealing property -- irrespective of the number of parameters within a KirchhoffNet, its on-chip forward calculation can always be completed within a short time. This characteristic makes KirchhoffNet a promising and fundamental paradigm for implementing large-scale neural networks, opening a new avenue in analog neural networks for AI.

CLSep 7, 2019Code
LAMOL: LAnguage MOdeling for Lifelong Language Learning

Fan-Keng Sun, Cheng-Hao Ho, Hung-Yi Lee

Most research on lifelong learning applies to images or games, but not language. We present LAMOL, a simple yet effective method for lifelong language learning (LLL) based on language modeling. LAMOL replays pseudo-samples of previous tasks while requiring no extra memory or model capacity. Specifically, LAMOL is a language model that simultaneously learns to solve the tasks and generate training samples. When the model is trained for a new task, it generates pseudo-samples of previous tasks for training alongside data for the new task. The results show that LAMOL prevents catastrophic forgetting without any sign of intransigence and can perform five very different language tasks sequentially with only one model. Overall, LAMOL outperforms previous methods by a considerable margin and is only 2-3% worse than multitasking, which is usually considered the LLL upper bound. The source code is available at https://github.com/jojotenya/LAMOL.

LGMar 30, 2025
Simple Feedfoward Neural Networks are Almost All You Need for Time Series Forecasting

Fan-Keng Sun, Yu-Cheng Wu, Duane S. Boning

Time series data are everywhere -- from finance to healthcare -- and each domain brings its own unique complexities and structures. While advanced models like Transformers and graph neural networks (GNNs) have gained popularity in time series forecasting, largely due to their success in tasks like language modeling, their added complexity is not always necessary. In our work, we show that simple feedforward neural networks (SFNNs) can achieve performance on par with, or even exceeding, these state-of-the-art models, while being simpler, smaller, faster, and more robust. Our analysis indicates that, in many cases, univariate SFNNs are sufficient, implying that modeling interactions between multiple series may offer only marginal benefits. Even when inter-series relationships are strong, a basic multivariate SFNN still delivers competitive results. We also examine some key design choices and offer guidelines on making informed decisions. Additionally, we critique existing benchmarking practices and propose an improved evaluation protocol. Although SFNNs may not be optimal for every situation (hence the ``almost'' in our title) they serve as a strong baseline that future time series forecasting methods should always be compared against.

LGJan 28, 2021
Adjusting for Autocorrelated Errors in Neural Networks for Time Series

Fan-Keng Sun, Christopher I. Lang, Duane S. Boning

An increasing body of research focuses on using neural networks to model time series. A common assumption in training neural networks via maximum likelihood estimation on time series is that the errors across time steps are uncorrelated. However, errors are actually autocorrelated in many cases due to the temporality of the data, which makes such maximum likelihood estimations inaccurate. In this paper, in order to adjust for autocorrelated errors, we propose to learn the autocorrelation coefficient jointly with the model parameters. In our experiments, we verify the effectiveness of our approach on time series forecasting. Results across a wide range of real-world datasets with various state-of-the-art models show that our method enhances performance in almost all cases. Based on these results, we suggest empirical critical values to determine the severity of autocorrelated errors. We also analyze several aspects of our method to demonstrate its advantages. Finally, other time series tasks are also considered to validate that our method is not restricted to only forecasting.

CLNov 14, 2020
Conditioned Natural Language Generation using only Unconditioned Language Model: An Exploration

Fan-Keng Sun, Cheng-I Lai

Transformer-based language models have shown to be very powerful for natural language generation (NLG). However, text generation conditioned on some user inputs, such as topics or attributes, is non-trivial. Past approach relies on either modifying the original LM architecture, re-training the LM on corpora with attribute labels, or having separately trained `guidance models' to guide text generation in decoding. We argued that the above approaches are not necessary, and the original unconditioned LM is sufficient for conditioned NLG. We evaluated our approaches by the samples' fluency and diversity with automated and human evaluation.

LGMar 2, 2020
Variational inference formulation for a model-free simulation of a dynamical system with unknown parameters by a recurrent neural network

Kyongmin Yeo, Dylan E. C. Grullon, Fan-Keng Sun et al.

We propose a recurrent neural network for a "model-free" simulation of a dynamical system with unknown parameters without prior knowledge. The deep learning model aims to jointly learn the nonlinear time marching operator and the effects of the unknown parameters from a time series dataset. We assume that the time series data set consists of an ensemble of trajectories for a range of the parameters. The learning task is formulated as a statistical inference problem by considering the unknown parameters as random variables. A latent variable is introduced to model the effects of the unknown parameters, and a variational inference method is employed to simultaneously train probabilistic models for the time marching operator and an approximate posterior distribution for the latent variable. Unlike the classical variational inference, where a factorized distribution is used to approximate the posterior, we employ a feedforward neural network supplemented by an encoder recurrent neural network to develop a more flexible probabilistic model. The approximate posterior distribution makes an inference on a trajectory to identify the effects of the unknown parameters. The time marching operator is approximated by a recurrent neural network, which takes a latent state sampled from the approximate posterior distribution as one of the input variables, to compute the time evolution of the probability distribution conditioned on the latent variable. In the numerical experiments, it is shown that the proposed variational inference model makes a more accurate simulation compared to the standard recurrent neural networks. It is found that the proposed deep learning model is capable of correctly identifying the dimensions of the random parameters and learning a representation of complex time series data.

LGSep 12, 2018
Temporal Pattern Attention for Multivariate Time Series Forecasting

Shun-Yao Shih, Fan-Keng Sun, Hung-yi Lee

Forecasting multivariate time series data, such as prediction of electricity consumption, solar power production, and polyphonic piano pieces, has numerous valuable applications. However, complex and non-linear interdependencies between time steps and series complicate the task. To obtain accurate prediction, it is crucial to model long-term dependency in time series data, which can be achieved to some good extent by recurrent neural network (RNN) with attention mechanism. Typical attention mechanism reviews the information at each previous time step and selects the relevant information to help generate the outputs, but it fails to capture the temporal patterns across multiple time steps. In this paper, we propose to use a set of filters to extract time-invariant temporal patterns, which is similar to transforming time series data into its "frequency domain". Then we proposed a novel attention mechanism to select relevant time series, and use its "frequency domain" information for forecasting. We applied the proposed model on several real-world tasks and achieved state-of-the-art performance in all of them with only one exception.