Chiwoo Park

LG
h-index43
13papers
203citations
Novelty53%
AI Score46

13 Papers

LGJan 20, 2023
Active Learning of Piecewise Gaussian Process Surrogates

Chiwoo Park, Robert Waelder, Bonggwon Kang et al.

Active learning of Gaussian process (GP) surrogates has been useful for optimizing experimental designs for physical/computer simulation experiments, and for steering data acquisition schemes in machine learning. In this paper, we develop a method for active learning of piecewise, Jump GP surrogates. Jump GPs are continuous within, but discontinuous across, regions of a design space, as required for applications spanning autonomous materials design, configuration of smart factory systems, and many others. Although our active learning heuristics are appropriated from strategies originally designed for ordinary GPs, we demonstrate that additionally accounting for model bias, as opposed to the usual model uncertainty, is essential in the Jump GP context. Toward that end, we develop an estimator for bias and variance of Jump GP models. Illustrations, and evidence of the advantage of our proposed methods, are provided on a suite of synthetic benchmarks, and real-simulation experiments of varying complexity.

LGMay 7
Online Bayesian Calibration under Gradual and Abrupt System Changes

Yang Xu, Chiwoo Park

Bayesian model calibration is central to digital twins and computer experiments, as it aligns model outputs with field observations by estimating calibration parameters and correcting systematic model bias. Classical Bayesian calibration introduces latent parameters and a discrepancy function to model bias, but suffers from parameter--discrepancy confounding and is typically formulated as an offline procedure under a stationary data-generating assumption. These limitations are restrictive in modern digital twin applications, where systems evolve over time and may exhibit gradual drift and abrupt regime shifts. While data assimilation methods enable sequential updates, they generally do not explicitly model systematic bias and are less effective under abrupt changes. We propose Bayesian Recursive Projected Calibration (BRPC), an online Bayesian calibration framework for streaming data under simulator mismatch and nonstationarity. BRPC extends projected calibration to the online setting by separating a discrepancy-free particle update for calibration parameters from a conditional Gaussian process update for discrepancy, preserving identifiability while enabling bias-aware adaptation under gradual system evolution. To handle abrupt changes, BRPC is integrated with restart mechanisms that detect regime shifts and reset the calibration process. We establish theoretical guarantees for both components, including tracking performance under gradual evolution and false-alarm and detection behavior for restart mechanisms. Empirical studies on synthetic and plant-simulation benchmarks show that BRPC improves calibration accuracy under gradual changes, while restart-augmented BRPC further improves robustness and predictive performance under abrupt regime shifts compared to sliding-window Bayesian calibration and data assimilation baselines.

LGOct 24, 2025
Deep Jump Gaussian Processes for Surrogate Modeling of High-Dimensional Piecewise Continuous Functions

Yang Xu, Chiwoo Park

We introduce Deep Jump Gaussian Processes (DJGP), a novel method for surrogate modeling of high-dimensional piecewise continuous functions. DJGP overcomes the limitations of conventional Jump Gaussian Processes in high-dimensional input spaces by adding a locally linear projection layer to Jump Gaussian Processes. This projection uses region-specific matrices to capture local subspace structures, naturally complementing the localized nature of JGP, a variant of local Gaussian Processes. To control model complexity, we place a Gaussian Process prior on the projection matrices, allowing them to evolve smoothly across the input space. The projected inputs are then modeled with a JGP to capture piecewise continuous relationships with the response. This yields a distinctive two-layer deep learning of GP/JGP. We further develop a scalable variational inference algorithm to jointly learn the projection matrices and JGP hyperparameters. Experiments on synthetic and benchmark datasets demonstrate that DJGP delivers superior predictive accuracy and more reliable uncertainty quantification compared to existing approaches.

MEMay 21, 2025
Modular Jump Gaussian Processes

Anna R. Flowers, Christopher T. Franck, Mickaël Binois et al.

Gaussian processes (GPs) furnish accurate nonlinear predictions with well-calibrated uncertainty. However, the typical GP setup has a built-in stationarity assumption, making it ill-suited for modeling data from processes with sudden changes, or "jumps" in the output variable. The "jump GP" (JGP) was developed for modeling data from such processes, combining local GPs and latent "level" variables under a joint inferential framework. But joint modeling can be fraught with difficulty. We aim to simplify by suggesting a more modular setup, eschewing joint inference but retaining the main JGP themes: (a) learning optimal neighborhood sizes that locally respect manifolds of discontinuity; and (b) a new cluster-based (latent) feature to capture regions of distinct output levels on both sides of the manifold. We show that each of (a) and (b) separately leads to dramatic improvements when modeling processes with jumps. In tandem (but without requiring joint inference) that benefit is compounded, as illustrated on real and synthetic benchmark examples from the recent literature.

MEApr 13, 2021
Gaussian Process Model for Estimating Piecewise Continuous Regression Functions

Chiwoo Park

This paper presents a Gaussian process (GP) model for estimating piecewise continuous regression functions. In scientific and engineering applications of regression analysis, the underlying regression functions are piecewise continuous in that data follow different continuous regression models for different regions of the data with possible discontinuities between the regions. However, many conventional GP regression approaches are not designed for piecewise regression analysis. We propose a new GP modeling approach for estimating an unknown piecewise continuous regression function. The new GP model seeks for a local GP estimate of an unknown regression function at each test location, using local data neighboring to the test location. To accommodate the possibilities of the local data from different regions, the local data is partitioned into two sides by a local linear boundary, and only the local data belonging to the same side as the test location is used for the regression estimate. This local split works very well when the input regions are bounded by smooth boundaries, so the local linear approximation of the smooth boundaries works well. We estimate the local linear boundary jointly with the other hyperparameters of the GP model, using the maximum likelihood approach. Its computation time is as low as the local GP's time. The superior numerical performance of the proposed approach over the conventional GP modeling approaches is shown using various simulated piecewise regression functions.

CVAug 25, 2020
Data Science for Motion and Time Analysis with Modern Motion Sensor Data

Chiwoo Park, Sang Do Noh, Anuj Srivastava

The motion-and-time analysis has been a popular research topic in operations research, especially for analyzing work performances in manufacturing and service operations. It is regaining attention as continuous improvement tools for lean manufacturing and smart factory. This paper develops a framework for data-driven analysis of work motions and studies their correlations to work speeds or execution rates, using data collected from modern motion sensors. The past analyses largely relied on manual steps involving time-consuming stop-watching and video-taping, followed by manual data analysis. While modern sensing devices have automated the collection of motion data, the motion analytics that transform the new data into knowledge are largely underdeveloped. Unsolved technical questions include: How the motion and time information can be extracted from the motion sensor data, how work motions and execution rates are statistically modeled and compared, and what are the statistical correlations of motions to the rates? In this paper, we develop a novel mathematical framework for motion and time analysis with motion sensor data, by defining new mathematical representation spaces of human motions and execution rates and by developing statistical tools on these new spaces. This methodological research is demonstrated using five use cases applied to manufacturing motion data.

LGAug 25, 2020
Variable selection for Gaussian process regression through a sparse projection

Chiwoo Park, David J. Borth, Nicholas S. Wilson et al.

This paper presents a new variable selection approach integrated with Gaussian process (GP) regression. We consider a sparse projection of input variables and a general stationary covariance model that depends on the Euclidean distance between the projected features. The sparse projection matrix is considered as an unknown parameter. We propose a forward stagewise approach with embedded gradient descent steps to co-optimize the parameter with other covariance parameters based on the maximization of a non-convex marginal likelihood function with a concave sparsity penalty, and some convergence properties of the algorithm are provided. The proposed model covers a broader class of stationary covariance functions than the existing automatic relevance determination approaches, and the solution approach is more computationally feasible than the existing MCMC sampling procedures for the automatic relevance parameter estimation with a sparsity prior. The approach is evaluated for a large number of simulated scenarios. The choice of tuning parameters and the accuracy of the parameter estimation are evaluated with the simulation study. In the comparison to some chosen benchmark approaches, the proposed approach has provided a better accuracy in the variable selection. It is applied to an important problem of identifying environmental factors that affect an atmospheric corrosion of metal alloys.

LGJan 14, 2020
Robust Gaussian Process Regression with a Bias Model

Chiwoo Park, David J. Borth, Nicholas S. Wilson et al.

This paper presents a new approach to a robust Gaussian process (GP) regression. Most existing approaches replace an outlier-prone Gaussian likelihood with a non-Gaussian likelihood induced from a heavy tail distribution, such as the Laplace distribution and Student-t distribution. However, the use of a non-Gaussian likelihood would incur the need for a computationally expensive Bayesian approximate computation in the posterior inferences. The proposed approach models an outlier as a noisy and biased observation of an unknown regression function, and accordingly, the likelihood contains bias terms to explain the degree of deviations from the regression function. We entail how the biases can be estimated accurately with other hyperparameters by a regularized maximum likelihood estimation. Conditioned on the bias estimates, the robust GP regression can be reduced to a standard GP regression problem with analytical forms of the predictive mean and variance estimates. Therefore, the proposed approach is simple and very computationally attractive. It also gives a very robust and accurate GP estimate for many tested scenarios. For the numerical evaluation, we perform a comprehensive simulation study to evaluate the proposed approach with the comparison to the existing robust GP approaches under various simulated scenarios of different outlier proportions and different noise levels. The approach is applied to data from two measurement systems, where the predictors are based on robust environmental parameter measurements and the response variables utilize more complex chemical sensing methods that contain a certain percentage of outliers. The utility of the measurement systems and value of the environmental data are improved through the computationally efficient GP regression and bias model.

MLApr 2, 2019
Sequential Adaptive Design for Jump Regression Estimation

Chiwoo Park, Peihua Qiu, Jennifer Carpena-Núñez et al.

Selecting input variables or design points for statistical models has been of great interest in adaptive design and active learning. Motivated by two scientific examples, this paper presents a strategy of selecting the design points for a regression model when the underlying regression function is discontinuous. The first example we undertook was for the purpose of accelerating imaging speed in a high resolution material imaging; the second was use of sequential design for the purpose of mapping a chemical phase diagram. In both examples, the underlying regression functions have discontinuities, so many of the existing design optimization approaches cannot be applied because they mostly assume a continuous regression function. Although some existing adaptive design strategies developed from treed regression models can handle the discontinuities, the Bayesian approaches come with computationally expensive Markov Chain Monte Carlo techniques for posterior inferences and subsequent design point selections, which is not appropriate for the first motivating example that requires computation at least faster than the original imaging speed. In addition, the treed models are based on the domain partitioning that are inefficient when the discontinuities occurs over complex sub-domain boundaries. We propose a simple and effective adaptive design strategy for a regression analysis with discontinuities: some statistical properties with a fixed design will be presented first, and then these properties will be used to propose a new criterion of selecting the design points for the regression analysis. Sequential design with the new criterion will be presented with comprehensive simulated examples, and its application to the two motivating examples will be presented.

LGJan 23, 2017
Patchwork Kriging for Large-scale Gaussian Process Regression

Chiwoo Park, Daniel Apley

This paper presents a new approach for Gaussian process (GP) regression for large datasets. The approach involves partitioning the regression input domain into multiple local regions with a different local GP model fitted in each region. Unlike existing local partitioned GP approaches, we introduce a technique for patching together the local GP models nearly seamlessly to ensure that the local GP models for two neighboring regions produce nearly the same response prediction and prediction error variance on the boundary between the two regions. This largely mitigates the well-known discontinuity problem that degrades the boundary accuracy of existing local partitioned GP methods. Our main innovation is to represent the continuity conditions as additional pseudo-observations that the differences between neighboring GP responses are identically zero at an appropriately chosen set of boundary input locations. To predict the response at any input location, we simply augment the actual response observations with the pseudo-observations and apply standard GP prediction methods to the augmented data. In contrast to heuristic continuity adjustments, this has an advantage of working within a formal GP framework, so that the GP-based predictive uncertainty quantification remains valid. Our approach also inherits a sparse block-like structure for the sample covariance matrix, which results in computationally efficient closed-form expressions for the predictive mean and variance. In addition, we provide a new spatial partitioning scheme based on a recursive space partitioning along local principal component directions, which makes the proposed approach applicable for regression domains having more than two dimensions. Using three spatial datasets and three higher dimensional datasets, we investigate the numerical performance of the approach and compare it to several state-of-the-art approaches.

APNov 24, 2016
Two-Level Structural Sparsity Regularization for Identifying Lattices and Defects in Noisy Images

Xin Li, Alex Belianinov, Ondrej Dyck et al.

This paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. Furthermore, real atomic scale images contain defects and vacancies, so atomic identification based solely on a lattice group may result in false positives and false negatives. To minimize error, model includes an individual sparsity regularization in addition to the group sparsity for a within-group selection, which results in a regression model with a two-level sparsity regularization. We propose a modification of the group orthogonal matching pursuit (gOMP) algorithm with a thresholding step to solve the atom finding problem. The convergence and statistical analyses of the proposed algorithm are presented. The proposed algorithm is also evaluated through numerical experiments with simulated images. The applicability of the algorithm on determination of atom structures and identification of imaging distortions and atomic defects was demonstrated using three real STEM images. We believe this is an important step toward automatic phase identification and assignment with the advent of genomic databases for materials.

CVSep 26, 2016
Robust Regression For Image Binarization Under Heavy Noises and Nonuniform Background

Garret Vo, Chiwoo Park

This paper presents a robust regression approach for image binarization under significant background variations and observation noises. The work is motivated by the need of identifying foreground regions in noisy microscopic image or degraded document images, where significant background variation and severe noise make an image binarization challenging. The proposed method first estimates the background of an input image, subtracts the estimated background from the input image, and apply a global thresholding to the subtracted outcome for achieving a binary image of foregrounds. A robust regression approach was proposed to estimate the background intensity surface with minimal effects of foreground intensities and noises, and a global threshold selector was proposed on the basis of a model selection criterion in a sparse regression. The proposed approach was validated using 26 test images and the corresponding ground truths, and the outcomes of the proposed work were compared with those from nine existing image binarization methods. The approach was also combined with three state-of-the-art morphological segmentation methods to show how the proposed approach can improve their image segmentation outcomes.

CVAug 4, 2016
Sparse Filtered SIRT for Electron Tomography

Chen Mu, Chiwoo Park

Electron tomographic reconstruction is a method for obtaining a three-dimensional image of a specimen with a series of two dimensional microscope images taken from different viewing angles. Filtered backprojection, one of the most popular tomographic reconstruction methods, does not work well under the existence of image noises and missing wedges. This paper presents a new approach to largely mitigate the effect of noises and missing wedges. We propose a novel filtered backprojection that optimizes the filter of the backprojection operator in terms of a reconstruction error. This data-dependent filter adaptively chooses the spectral domains of signals and noises, suppressing the noise frequency bands, so it is very effective in denoising. We also propose the new filtered backprojection embedded within the simultaneous iterative reconstruction iteration for mitigating the effect of missing wedges. Our numerical study is presented to show the performance gain of the proposed approach over the state-of-the-art.