Jerry Zhijian Yang

h-index18

14papers

84citations

Novelty59%

AI Score54

Ranked #10,268 of 194,257 authors (top 5%)#2,671 in LG (top 7%)

14 Papers

8.8LGMar 28, 2023

GAS: A Gaussian Mixture Distribution-Based Adaptive Sampling Method for PINNs

Yuling Jiao, Di Li, Xiliang Lu et al.

With the recent study of deep learning in scientific computation, the Physics-Informed Neural Networks (PINNs) method has drawn widespread attention for solving Partial Differential Equations (PDEs). Compared to traditional methods, PINNs can efficiently handle high-dimensional problems, but the accuracy is relatively low, especially for highly irregular problems. Inspired by the idea of adaptive finite element methods and incremental learning, we propose GAS, a Gaussian mixture distribution-based adaptive sampling method for PINNs. During the training procedure, GAS uses the current residual information to generate a Gaussian mixture distribution for the sampling of additional points, which are then trained together with historical data to speed up the convergence of the loss and achieve higher accuracy. Several numerical simulations on 2D and 10D problems show that GAS is a promising method that achieves state-of-the-art accuracy among deep solvers, while being comparable with traditional numerical solvers.

7.9NAApr 4Code

Nonlinear Assimilation via Score-based Sequential Langevin Sampling

Zhao Ding, Chenguang Duan, Yuling Jiao et al.

This paper introduces score-based sequential Langevin sampling (SSLS), a novel approach to nonlinear data assimilation within a recursive Bayesian filtering framework. The proposed method decomposes the assimilation process into alternating prediction and update steps, using dynamic models for state prediction and incorporating observational data via score-based Langevin Monte Carlo during the updates. To overcome inherent challenges in highly non-log-concave posterior sampling, we integrate an annealing strategy into the update mechanism. Theoretically, we establish convergence guarantees for SSLS in total variation (TV) distance, yielding concrete insights into the algorithm's error behavior with respect to key hyperparameters. Crucially, our derived error bounds demonstrate the asymptotic stability of SSLS, guaranteeing that local posterior sampling errors do not accumulate indefinitely over time. Extensive numerical experiments across challenging scenarios, including high-dimensional systems, strong nonlinearity, and sparse observations, highlight the robust performance of the proposed method. Furthermore, SSLS effectively quantifies the uncertainty associated with state estimates, rendering it particularly valuable for reliable error calibration.

3.3NAJun 24, 2023

Current density impedance imaging with PINNs

Chenguang Duan, Yuling Jiao, Xiliang Lu et al.

In this paper, we introduce CDII-PINNs, a computationally efficient method for solving CDII using PINNs in the framework of Tikhonov regularization. This method constructs a physics-informed loss function by merging the regularized least-squares output functional with an underlying differential equation, which describes the relationship between the conductivity and voltage. A pair of neural networks representing the conductivity and voltage, respectively, are coupled by this loss function. Then, minimizing the loss function provides a reconstruction. A rigorous theoretical guarantee is provided. We give an error analysis for CDII-PINNs and establish a convergence rate, based on prior selected neural network parameters in terms of the number of samples. The numerical simulations demonstrate that CDII-PINNs are efficient, accurate and robust to noise levels ranging from $1\%$ to $20\%$.

9.7QUANT-PHOct 11, 2023

Non-asymptotic Approximation Error Bounds of Parameterized Quantum Circuits

Zhan Yu, Qiuhao Chen, Yuling Jiao et al.

Parameterized quantum circuits (PQCs) have emerged as a promising approach for quantum neural networks. However, understanding their expressive power in accomplishing machine learning tasks remains a crucial question. This paper investigates the expressivity of PQCs for approximating general multivariate function classes. Unlike previous Universal Approximation Theorems for PQCs, which are either nonconstructive or rely on parameterized classical data processing, we explicitly construct data re-uploading PQCs for approximating multivariate polynomials and smooth functions. We establish the first non-asymptotic approximation error bounds for these functions in terms of the number of qubits, quantum circuit depth, and number of trainable parameters. Notably, we demonstrate that for approximating functions that satisfy specific smoothness criteria, the quantum circuit size and number of trainable parameters of our proposed PQCs can be smaller than those of deep ReLU neural networks. We further validate the approximation capability of PQCs through numerical experiments. Our results provide a theoretical foundation for designing practical PQCs and quantum neural networks for machine learning tasks that can be implemented on near-term quantum devices, paving the way for the advancement of quantum machine learning.

6.6NAJun 1

An alternating learning-based collocation method for solving inverse elliptic problems

Zhizhong Kong, Jerry Zhijian Yang, Cheng Yuan

We propose the Alternating Learning-Based Collocation (ALBC) method for solving inverse elliptic problems. Our approach employs sinusoidal shallow networks as adaptive basis generators. By alternately updating the state variable and the unknown parameter, we decompose the original nonconvex joint optimization problem into a sequence of tractable linear subproblems. This strategy effectively overcomes the fixed-basis limitations of classical collocation methods while avoiding the slow convergence typically encountered in deep learning approaches. Theoretically, we establish stability estimates and prove the convergence of the proposed algorithm. Numerical experiments on five benchmark problems demonstrate the efficacy of ALBC, which consistently outperforms the standard collocation method in accuracy. Furthermore, it achieves performance comparable to or better than that of physics-informed neural networks at a substantially lower computational cost. Finally, the method remains robust under noise levels of up to twenty percent.

5.3LGMay 8

Approximation Error Upper and Lower Bounds for Hölder Class with Transformers

Xin He, Yuling Jiao, Xiliang Lu et al.

We explore the expressive power of Transformers by establishing precise approximation error upper and lower bounds for Hölder class. Specifically, a new approximation upper bound is derived for the standard Transformer architecture equipped with Softmax operators, ReLU activation functions, and residual connections. We prove that a Transformer network composed of at most $\mathcal{O}(\varepsilon^{-{d_{0}}/α})$ blocks can approximate any bounded Hölder function with $d_{0}$-dimensional input and smoothness $α\in(0,1]$ under any accuracy $\varepsilon>0$. In the case of approximation lower bounds, leveraging the VC-dimension upper bound, we are the first to rigorously prove that Transformers demand for at least $Ω(\varepsilon^{-{d_{0}}/({4α})})$ blocks to achieve the $\varepsilon$ approximation accuracy. As a final step, we extend the derived results for standard Transformers to a general regression task and establish the corresponding excess risk rates demonstrating Transformers' empirical effectiveness in real-world settings.

2.3NAJul 12, 2024

DRM Revisited: A Complete Error Analysis

Yuling Jiao, Ruoxuan Li, Peiying Wu et al.

In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number of iterations, such that the output of the gradient descent process closely approximates the true solution of the underlying partial differential equation to the specified precision?

2.3NAJan 13

Sampling via Stochastic Interpolants by Langevin-based Velocity and Initialization Estimation in Flow ODEs

Chenguang Duan, Yuling Jiao, Gabriele Steidl et al.

We propose a novel method for sampling from unnormalized Boltzmann densities based on a probability-flow ordinary differential equation (ODE) derived from linear stochastic interpolants. The key innovation of our approach is the use of a sequence of Langevin samplers to enable efficient simulation of the flow. Specifically, these Langevin samplers are employed (i) to generate samples from the interpolant distribution at intermediate times and (ii) to construct, starting from these intermediate times, a robust estimator of the velocity field governing the flow ODE. For both applications of the Langevin diffusions, we establish convergence guarantees. Extensive numerical experiments demonstrate the efficiency of the proposed method on challenging multimodal distributions across a range of dimensions, as well as its effectiveness in Bayesian inference tasks.

3.1MLAug 16, 2024

Adv-SSL: Adversarial Self-Supervised Representation Learning with Theoretical Guarantees

Chenguang Duan, Yuling Jiao, Huazhen Lin et al.

Learning transferable data representations from abundant unlabeled data remains a central challenge in machine learning. Although numerous self-supervised learning methods have been proposed to address this challenge, a significant class of these approaches aligns the covariance or correlation matrix with the identity matrix. Despite impressive performance across various downstream tasks, these methods often suffer from biased sample risk, leading to substantial optimization shifts in mini-batch settings and complicating theoretical analysis. In this paper, we introduce a novel \underline{\bf Adv}ersarial \underline{\bf S}elf-\underline{\bf S}upervised Representation \underline{\bf L}earning (Adv-SSL) for unbiased transfer learning with no additional cost compared to its biased counterparts. Our approach not only outperforms the existing methods across multiple benchmark datasets but is also supported by comprehensive end-to-end theoretical guarantees. Our analysis reveals that the minimax optimization in Adv-SSL encourages representations to form well-separated clusters in the embedding space, provided there is sufficient upstream unlabeled data. As a result, our method achieves strong classification performance even with limited downstream labels, shedding new light on few-shot learning.

9.2MLJan 9, 2024

Semi-Supervised Deep Sobolev Regression: Estimation and Variable Selection by ReQU Neural Network

Zhao Ding, Chenguang Duan, Yuling Jiao et al.

We propose SDORE, a Semi-supervised Deep Sobolev Regressor, for the nonparametric estimation of the underlying regression function and its gradient. SDORE employs deep ReQU neural networks to minimize the empirical risk with gradient norm regularization, allowing the approximation of the regularization term by unlabeled data. Our study includes a thorough analysis of the convergence rates of SDORE in $L^{2}$-norm, achieving the minimax optimality. Further, we establish a convergence rate for the associated plug-in gradient estimator, even in the presence of significant domain shift. These theoretical findings offer valuable insights for selecting regularization parameters and determining the size of the neural network, while showcasing the provable advantage of leveraging unlabeled data in semi-supervised learning. To the best of our knowledge, SDORE is the first provable neural network-based approach that simultaneously estimates the regression function and its gradient, with diverse applications such as nonparametric variable selection. The effectiveness of SDORE is validated through an extensive range of numerical simulations.

4.6LGOct 12, 2024

Deep Transfer Learning: Model Framework and Error Analysis

Yuling Jiao, Huazhen Lin, Yuchen Luo et al.

This paper presents a framework for deep transfer learning, which aims to leverage information from multi-domain upstream data with a large number of samples $n$ to a single-domain downstream task with a considerably smaller number of samples $m$, where $m \ll n$, in order to enhance performance on downstream task. Our framework offers several intriguing features. First, it allows the existence of both shared and domain-specific features across multi-domain data and provides a framework for automatic identification, achieving precise transfer and utilization of information. Second, the framework explicitly identifies upstream features that contribute to downstream tasks, establishing clear relationships between upstream domains and downstream tasks, thereby enhancing interpretability. Error analysis shows that our framework can significantly improve the convergence rate for learning Lipschitz functions in downstream supervised tasks, reducing it from $\tilde{O}(m^{-\frac{1}{2(d+2)}}+n^{-\frac{1}{2(d+2)}})$ ("no transfer") to $\tilde{O}(m^{-\frac{1}{2(d^*+3)}} + n^{-\frac{1}{2(d+2)}})$ ("partial transfer"), and even to $\tilde{O}(m^{-1/2}+n^{-\frac{1}{2(d+2)}})$ ("complete transfer"), where $d^* \ll d$ and $d$ is the dimension of the observed data. Our theoretical findings are supported by empirical experiments on image classification and regression datasets.

5.5MLFeb 2, 2024Code

Deep conditional distribution learning via conditional Föllmer flow

Jinyuan Chang, Zhao Ding, Yuling Jiao et al.

We introduce an ordinary differential equation (ODE) based deep generative method for learning conditional distributions, named Conditional Föllmer Flow. Starting from a standard Gaussian distribution, the proposed flow could approximate the target conditional distribution very well when the time is close to 1. For effective implementation, we discretize the flow with Euler's method where we estimate the velocity field nonparametrically using a deep neural network. Furthermore, we also establish the convergence result for the Wasserstein-2 distance between the distribution of the learned samples and the target conditional distribution, providing the first comprehensive end-to-end error analysis for conditional distribution learning via ODE flow. Our numerical experiments showcase its effectiveness across a range of scenarios, from standard nonparametric conditional density estimation problems to more intricate challenges involving image data, illustrating its superiority over various existing conditional density estimation methods.

7.9LGMay 9, 2024Code

Characteristic Learning for Provable One Step Generation

Zhao Ding, Chenguang Duan, Yuling Jiao et al.

We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, we first estimate the underlying velocity field and use the Euler method to solve the probability flow ODE, generating discrete approximations of the characteristics. A deep neural network is then trained to fit these characteristics, creating a one-step map that pushes a simple Gaussian distribution to the target distribution. In the theoretical aspect, we provide a comprehensive analysis of the errors arising from velocity matching, Euler discretization, and characteristic fitting to establish a non-asymptotic convergence rate in the 2-Wasserstein distance under mild data assumptions. Crucially, we demonstrate that under a standard manifold assumption, this convergence rate depends only on the intrinsic dimension of data rather than the much larger ambient dimension, proving our model's ability to mitigate the curse of dimensionality. To our knowledge, this is the first rigorous convergence analysis for a flow-based one-step generative model. Experiments on both synthetic and real-world datasets demonstrate that the characteristic generator achieves high-quality and high-resolution sample generation with the efficiency of just a single neural network evaluation.

9.9LGFeb 28, 2021

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class

Yuling Jiao, Yanming Lai, Xiliang Lu et al.

In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $ω_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $\mathcal{O}(ω_f(\sqrt{d})\cdot2^{-M}+ω_{f}\left(\frac{\sqrt{d}}{N}\right))$, where $M,N\in \mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3μ}ε\right)^{1/α}\right\rceil,2\left\lceil\log_2\frac{3μd^{α/2}}{2ε}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_μ^α([0,1]^d)$ within a given tolerance $ε>0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_μ^α([0,1]^d)$ denotes the Hölder continuous function class defined on $[0,1]^d$ with order $α\in (0,1]$ and constant $μ> 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $\mathcal{H}_μ^α([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.