NAJan 30, 2017
Stochastic Convergence of A Nonconforming Finite Element Method for the Thin Plate Spline Smoother for Observational DataZhiming Chen, Rui Tuo, Wenlong Zhang
The thin plate spline smoother is a classical model for fnding a smooth function from the knowledge of its observation at scattered locations which may have random noises. We consider a nonconforming Morley finite element method to approximate the model. We prove the stochastic convergence of the finite element method which characterizes the tail property of the probability distribution function of the finite element error. We also propose a self-consistent iterative algorithm to determine the smoothing parameter based on our theoretical analysis. Numerical examples are included to confirm the theoretical analysis and to show the competitive performance of the self- consistent algorithm for finding the smoothing parameter.
MLMar 7, 2022
Kernel Packet: An Exact and Scalable Algorithm for Gaussian Process Regression with Matérn CorrelationsHaoyuan Chen, Liang Ding, Rui Tuo
We develop an exact and scalable algorithm for one-dimensional Gaussian process regression with Matérn correlations whose smoothness parameter $ν$ is a half-integer. The proposed algorithm only requires $\mathcal{O}(ν^3 n)$ operations and $\mathcal{O}(νn)$ storage. This leads to a linear-cost solver since $ν$ is chosen to be fixed and usually very small in most applications. The proposed method can be applied to multi-dimensional problems if a full grid or a sparse grid design is used. The proposed method is based on a novel theory for Matérn correlation functions. We find that a suitable rearrangement of these correlation functions can produce a compactly supported function, called a "kernel packet". Using a set of kernel packets as basis functions leads to a sparse representation of the covariance matrix that results in the proposed algorithm. Simulation studies show that the proposed algorithm, when applicable, is significantly superior to the existing alternatives in both the computational time and predictive accuracy.
3.7LGMay 26
SIKA-GP: Accelerating Gaussian Process Inference with Sparse Inducing Kernel Approximations for Bayesian Deep LearningWenyuan Zhao, Rui Tuo, Chao Tian
Gaussian processes (GPs) provide a principled Bayesian framework for uncertainty estimation, but their computational complexity severely limits scalability to large datasets. We propose SIKA-GP, which accelerates GP inference using sparse inducing kernel approximations based on a dyadic ordered template basis, incurring only ${O}(\log M)$ complexity dependence on the number of inducing points. Our approach constructs compact and expressive kernel representations from sparsely activated bases, enabling efficient tensorized GPU computation and seamless integration with modern large-scale models. SIKA-GP can be naturally embedded into Bayesian neural networks (BNNs) with sparse activations, yielding significant speedups in both training and inference without sacrificing predictive performance. The method naturally extends to deep feature learning, addressing the scalability challenges introduced by deep architectures and high-dimensional feature representations. Empirical results on vision and transformer-based language benchmarks demonstrate that our approach consistently delivers fast and accurate GP models, providing a principled path toward scalable kernel learning.
NAFeb 16, 2017
A finite element method for elliptic problems with observational boundary dataZhiming Chen, Rui Tuo, Wenlong Zhang
In this paper we propose a finite element method for solving elliptic equations with the observational Dirichlet boundary data which may subject to random noises. The method is based on the weak formulation of Lagrangian multiplier. We show the convergence of the random finite element error in expectation and, when the noise is sub-Gaussian, in the Orlicz 2- norm which implies the probability that the finite element error estimates are violated decays exponentially. Numerical examples are included.
LGNov 14, 2022
Renewing Iterative Self-labeling Domain Adaptation with Application to the Spine Motion PredictionGecheng Chen, Yu Zhou, Xudong Zhang et al.
The area of transfer learning comprises supervised machine learning methods that cope with the issue when the training and testing data have different input feature spaces or distributions. In this work, we propose a novel transfer learning algorithm called Renewing Iterative Self-labeling Domain Adaptation (Re-ISDA). In this work, we propose a novel transfer learning algorithm called Renewing Iterative Self-labeling Domain Adaptation (Re-ISDA).
LGOct 30, 2025
BI-DCGAN: A Theoretically Grounded Bayesian Framework for Efficient and Diverse GANsMahsa Valizadeh, Rui Tuo, James Caverlee
Generative Adversarial Networks (GANs) are proficient at generating synthetic data but continue to suffer from mode collapse, where the generator produces a narrow range of outputs that fool the discriminator but fail to capture the full data distribution. This limitation is particularly problematic, as generative models are increasingly deployed in real-world applications that demand both diversity and uncertainty awareness. In response, we introduce BI-DCGAN, a Bayesian extension of DCGAN that incorporates model uncertainty into the generative process while maintaining computational efficiency. BI-DCGAN integrates Bayes by Backprop to learn a distribution over network weights and employs mean-field variational inference to efficiently approximate the posterior distribution during GAN training. We establishes the first theoretical proof, based on covariance matrix analysis, that Bayesian modeling enhances sample diversity in GANs. We validate this theoretical result through extensive experiments on standard generative benchmarks, demonstrating that BI-DCGAN produces more diverse and robust outputs than conventional DCGANs, while maintaining training efficiency. These findings position BI-DCGAN as a scalable and timely solution for applications where both diversity and uncertainty are critical, and where modern alternatives like diffusion models remain too resource-intensive.
MLAug 1, 2024
Aggregation Models with Optimal Weights for Distributed Gaussian ProcessesHaoyuan Chen, Rui Tuo
Gaussian process (GP) models have received increasing attention in recent years due to their superb prediction accuracy and modeling flexibility. To address the computational burdens of GP models for large-scale datasets, distributed learning for GPs are often adopted. Current aggregation models for distributed GPs is not time-efficient when incorporating correlations between GP experts. In this work, we propose a novel approach for aggregated prediction in distributed GPs. The technique is suitable for both the exact and sparse variational GPs. The proposed method incorporates correlations among experts, leading to better prediction accuracy with manageable computational requirements. As demonstrated by empirical studies, the proposed approach results in more stable predictions in less time than state-of-the-art consistent aggregation models.
MLFeb 6, 2024
Beyond State Space Representation: A General Theory for Kernel PacketsLiang Ding, Rui Tuo, Lu Zhou
Gaussian process (GP) regression provides a flexible, nonparametric framework for probabilistic modeling, yet remains computationally demanding in large-scale applications. For one-dimensional data, state space (SS) models achieve linear-time inference by reformulating GPs as stochastic differential equations (SDEs). However, SS approaches are confined to gridded inputs and cannot handle multi-dimensional scattered data. We propose a new framework based on kernel packet (KP), which overcomes these limitations while retaining exactness and scalability. A KP is a compactly supported function defined as a linear combination of the GP covariance functions. In this article, we prove that KPs can be identified via the forward and backward SS representations. We also show that the KP approach enables exact inference with linear-time training and logarithmic or constant-time prediction, and extends naturally to multi-dimensional gridded or scattered data without low-rank approximations. Numerical experiments on large-scale additive and product-form GPs with millions of samples demonstrate that KPs achieve exact, memory-efficient inference where SDE-based and low-rank GP methods fail.
CLOct 20, 2025
Language Models as Semantic Augmenters for Sequential RecommendersMahsa Valizadeh, Xiangjue Dong, Rui Tuo et al.
Large Language Models (LLMs) excel at capturing latent semantics and contextual relationships across diverse modalities. However, in modeling user behavior from sequential interaction data, performance often suffers when such semantic context is limited or absent. We introduce LaMAR, a LLM-driven semantic enrichment framework designed to enrich such sequences automatically. LaMAR leverages LLMs in a few-shot setting to generate auxiliary contextual signals by inferring latent semantic aspects of a user's intent and item relationships from existing metadata. These generated signals, such as inferred usage scenarios, item intents, or thematic summaries, augment the original sequences with greater contextual depth. We demonstrate the utility of this generated resource by integrating it into benchmark sequential modeling tasks, where it consistently improves performance. Further analysis shows that LLM-generated signals exhibit high semantic novelty and diversity, enhancing the representational capacity of the downstream models. This work represents a new data-centric paradigm where LLMs serve as intelligent context generators, contributing a new method for the semi-automatic creation of training data and language resources.
LGFeb 14, 2025
From Deep Additive Kernel Learning to Last-Layer Bayesian Neural Networks via Induced Prior ApproximationWenyuan Zhao, Haoyuan Chen, Tie Liu et al.
With the strengths of both deep learning and kernel methods like Gaussian Processes (GPs), Deep Kernel Learning (DKL) has gained considerable attention in recent years. From the computational perspective, however, DKL becomes challenging when the input dimension of the GP layer is high. To address this challenge, we propose the Deep Additive Kernel (DAK) model, which incorporates i) an additive structure for the last-layer GP; and ii) induced prior approximation for each GP unit. This naturally leads to a last-layer Bayesian neural network (BNN) architecture. The proposed method enjoys the interpretability of DKL as well as the computational advantages of BNN. Empirical results show that the proposed approach outperforms state-of-the-art DKL methods in both regression and classification tasks.
LGMay 25, 2023
Privacy-aware Gaussian Process RegressionRui Tuo, Haoyuan Chen, Raktim Bhattacharya
We propose a novel theoretical and methodological framework for Gaussian process regression subject to privacy constraints. The proposed method can be used when a data owner is unwilling to share a high-fidelity supervised learning model built from their data with the public due to privacy concerns. The key idea of the proposed method is to add synthetic noise to the data until the predictive variance of the Gaussian process model reaches a prespecified privacy level. The optimal covariance matrix of the synthetic noise is formulated in terms of semi-definite programming. We also introduce the formulation of privacy-aware solutions under continuous privacy constraints using kernel-based approaches, and study their theoretical properties. The proposed method is illustrated by considering a model that tracks the trajectories of satellites and a real application on a census dataset.
MLDec 11, 2021
A Sparse Expansion For Deep Gaussian ProcessesLiang Ding, Rui Tuo, Shahin Shahrampour
In this work, we use Deep Gaussian Processes (DGPs) as statistical surrogates for stochastic processes with complex distributions. Conventional inferential methods for DGP models can suffer from high computational complexity as they require large-scale operations with kernel matrices for training and inference. In this work, we propose an efficient scheme for accurate inference and efficient training based on a range of Gaussian Processes, called the Tensor Markov Gaussian Processes (TMGP). We construct an induced approximation of TMGP referred to as the hierarchical expansion. Next, we develop a deep TMGP (DTMGP) model as the composition of multiple hierarchical expansion of TMGPs. The proposed DTMGP model has the following properties: (1) the outputs of each activation function are deterministic while the weights are chosen independently from standard Gaussian distribution; (2) in training or prediction, only polylog(M) (out of M) activation functions have non-zero outputs, which significantly boosts the computational efficiency. Our numerical experiments on synthetic models and real datasets show the superior computational efficiency of DTMGP over existing DGP models.
MLJul 19, 2021
High-Dimensional Simulation Optimization via Brownian Fields and Sparse GridsLiang Ding, Rui Tuo, Xiaowei Zhang
High-dimensional simulation optimization is notoriously challenging. We propose a new sampling algorithm that converges to a global optimal solution and suffers minimally from the curse of dimensionality. The algorithm consists of two stages. First, we take samples following a sparse grid experimental design and approximate the response surface via kernel ridge regression with a Brownian field kernel. Second, we follow the expected improvement strategy -- with critical modifications that boost the algorithm's sample efficiency -- to iteratively sample from the next level of the sparse grid. Under mild conditions on the smoothness of the response surface and the simulation noise, we establish upper bounds on the convergence rate for both noise-free and noisy simulation samples. These upper bounds deteriorate only slightly in the dimension of the feasible set, and they can be improved if the objective function is known to be of a higher-order smoothness. Extensive numerical experiments demonstrate that the proposed algorithm dramatically outperforms typical alternatives in practice.
APDec 2, 2020
The temporal overfitting problem with applications in wind power curve modelingAbhinav Prakash, Rui Tuo, Yu Ding
This paper is concerned with a nonparametric regression problem in which the input variables and the errors are autocorrelated in time. The motivation for the research stems from modeling wind power curves. Using existing model selection methods, like cross validation, results in model overfitting in presence of temporal autocorrelation. This phenomenon is referred to as temporal overfitting, which causes loss of performance while predicting responses for a time domain different from the training time domain. We propose a Gaussian process (GP)-based method to tackle the temporal overfitting problem. Our model is partitioned into two parts -- a time-invariant component and a time-varying component, each of which is modeled through a GP. We modify the inference method to a thinning-based strategy, an idea borrowed from Markov chain Monte Carlo sampling, to overcome temporal overfitting and estimate the time-invariant component. We extensively compare our proposed method with both existing power curve models and available ideas for handling temporal overfitting on real wind turbine datasets. Our approach yields significant improvement when predicting response for a time period different from the training time period. Supplementary material and computer code for this article is available online.
LGJun 5, 2020
High-Dimensional Non-Parametric Density Estimation in Mixed Smooth Sobolev SpacesLiang Ding, Lu Zou, Wenjia Wang et al.
Density estimation plays a key role in many tasks in machine learning, statistical inference, and visualization. The main bottleneck in high-dimensional density estimation is the prohibitive computational cost and the slow convergence rate. In this paper, we propose novel estimators for high-dimensional non-parametric density estimation called the adaptive hyperbolic cross density estimators, which enjoys nice convergence properties in the mixed smooth Sobolev spaces. As modifications of the usual Sobolev spaces, the mixed smooth Sobolev spaces are more suitable for describing high-dimensional density functions in some applications. We prove that, unlike other existing approaches, the proposed estimator does not suffer the curse of dimensionality under Integral Probability Metric, including Hölder Integral Probability Metric, where Total Variation Metric and Wasserstein Distance are special cases. Applications of the proposed estimators to generative adversarial networks (GANs) and goodness of fit test for high-dimensional data are discussed to illustrate the proposed estimator's good performance in high-dimensional problems. Numerical experiments are conducted and illustrate the efficiency of our proposed method.
MLApr 1, 2020
Projection Pursuit Gaussian Process RegressionGecheng Chen, Rui Tuo
A primary goal of computer experiments is to reconstruct the function given by the computer code via scattered evaluations. Traditional isotropic Gaussian process models suffer from the curse of dimensionality, when the input dimension is relatively high given limited data points. Gaussian process models with additive correlation functions are scalable to dimensionality, but they are more restrictive as they only work for additive functions. In this work, we consider a projection pursuit model, in which the nonparametric part is driven by an additive Gaussian process regression. We choose the dimension of the additive function higher than the original input dimension, and call this strategy "dimension expansion". We show that dimension expansion can help approximate more complex functions. A gradient descent algorithm is proposed for model training based on the maximum likelihood estimation. Simulation studies show that the proposed method outperforms the traditional Gaussian process models. The Supplementary Materials are available online.
LGFeb 11, 2020
Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal FeaturesLiang Ding, Rui Tuo, Shahin Shahrampour
Despite their success, kernel methods suffer from a massive computational cost in practice. In this paper, in lieu of commonly used kernel expansion with respect to $N$ inputs, we develop a novel optimal design maximizing the entropy among kernel features. This procedure results in a kernel expansion with respect to entropic optimal features (EOF), improving the data representation dramatically due to features dissimilarity. Under mild technical assumptions, our generalization bound shows that with only $O(N^{\frac{1}{4}})$ features (disregarding logarithmic factors), we can achieve the optimal statistical accuracy (i.e., $O(1/\sqrt{N})$). The salient feature of our design is its sparsity that significantly reduces the time and space cost. Our numerical experiments on benchmark datasets verify the superiority of EOF over the state-of-the-art in kernel approximation.
STFeb 4, 2020
Uncertainty Quantification for Bayesian OptimizationRui Tuo, Wenjia Wang
Bayesian optimization is a class of global optimization techniques. In Bayesian optimization, the underlying objective function is modeled as a realization of a Gaussian process. Although the Gaussian process assumption implies a random distribution of the Bayesian optimization outputs, quantification of this uncertainty is rarely studied in the literature. In this work, we propose a novel approach to assess the output uncertainty of Bayesian optimization algorithms, which proceeds by constructing confidence regions of the maximum point (or value) of the objective function. These regions can be computed efficiently, and their confidence levels are guaranteed by the uniform error bounds for sequential Gaussian process regression newly developed in the present work. Our theory provides a unified uncertainty quantification framework for all existing sequential sampling policies and stopping criteria.
STAug 29, 2018
Differentially Private Change-Point DetectionRachel Cummings, Sara Krehbiel, Yajun Mei et al.
The change-point detection problem seeks to identify distributional changes at an unknown change-point k* in a stream of data. This problem appears in many important practical settings involving personal data, including biosurveillance, fault detection, finance, signal detection, and security systems. The field of differential privacy offers data analysis tools that provide powerful worst-case privacy guarantees. We study the statistical problem of change-point detection through the lens of differential privacy. We give private algorithms for both online and offline change-point detection, analyze these algorithms theoretically, and provide empirical validation of our results.