Haizhao Yang

LG
h-index24
76papers
1,999citations
Novelty58%
AI Score60

76 Papers

OCMay 26
Global Convergence and Error Propagation in Neural Gradient Flows: A Riemannian Optimization Framework

Shixin Zheng, Yiwei Wang, Haizhao Yang

We develop a geometric convergence theory for neural-network optimization within the minimizing movement scheme (MMS) framework. Reformulating each neural MMS step as a minimization over the set of increments in a Hilbert space, we show that under a $C^2$ network with locally non-degenerate Jacobian this increment set is a boundaryless smooth embedded submanifold, on which a natural preconditioned (Gauss--Newton-type) gradient flow in parameter space induces exactly the Riemannian gradient flow. Under a strict interior-localization condition and an explicit data condition, the reached sublevel set is geodesically convex and the subproblem objective is geodesically strongly convex on it; both the continuous Riemannian gradient flow and its discrete companion via the exponential map converge linearly to the unique subproblem minimizer. Propagating finite-time inner-solver inexactness and neural-approximation error through the MMS iterations yields a uniform function-space tracking bound and an explicit trajectory budget, so the inexact neural iterates converge to an $O(δ)$-neighborhood of the global minimum. Numerical experiments on nonlinear regression and a small-scale latent-diffusion testbed indicate that the Gauss--Newton-type inner solver achieves smaller trajectory errors with substantially fewer inner iterations than first-order baselines.

NANov 13, 2018
$3D$ Crystal Image Analysis based on Fast Synchrosqueezed Transforms

Tao Zhang, Ling Li, Haizhao Yang

We propose an efficient algorithm to analyze $3D$ atomic resolution crystal images based on a fast $3D$ synchrosqueezed wave packet transform. The proposed algorithm can automatically extract microscopic information from $3D$ atomic resolution crystal images, e.g., crystal orientation, defects, and deformation, which are important information for characterizing material properties. The effectiveness of our algorithms is illustrated by experiments of synthetic datasets and real $3$D microscopic colloidal images.

NANov 16, 2016
Interpolative Butterfly Factorization

Yingzhou Li, Haizhao Yang

This paper introduces the interpolative butterfly factorization for nearly optimal implementation of several transforms in harmonic analysis, when their explicit formulas satisfy certain analytic properties and the matrix representations of these transforms satisfy a complementary low-rank property. A preliminary interpolative butterfly factorization is constructed based on interpolative low-rank approximations of the complementary low-rank matrix. A novel sweeping matrix compression technique further compresses the preliminary interpolative butterfly factorization via a sequence of structure-preserving low-rank approximations. The sweeping procedure propagates the low-rank property among neighboring matrix factors to compress dense submatrices in the preliminary butterfly factorization to obtain an optimal one in the butterfly scheme. For an $N\times N$ matrix, it takes $O(N\log N)$ operations and complexity to construct the factorization as a product of $O(\log N)$ sparse matrices, each with $O(N)$ nonzero entries. Hence, it can be applied rapidly in $O(N\log N)$ operations. Numerical results are provided to demonstrate the effectiveness of this algorithm.

AIDec 6, 2022
What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Songyang Han, Sanbao Su, Sihong He et al.

Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.

NAMar 11, 2018
Fast algorithms for Jacobi expansions via nonoscillatory phase functions

James Bremer, Haizhao Yang

We describe a suite of fast algorithms for evaluating Jacobi polynomials, applying the corresponding discrete Sturm-Liouville eigentransforms and calculating Gauss-Jacobi quadrature rules. Our approach is based on the well-known fact that Jacobi's differential equation admits a nonoscillatory phase function which can be loosely approximated via an affine function over much of its domain. Our algorithms perform better than currently available methods in most respects. We illustrate this with several numerical experiments, the source code for which is publicly available.

LGMay 19, 2022
Neural Network Architecture Beyond Width and Depth

Zuowei Shen, Haizhao Yang, Shijun Zhang

This paper proposes a new neural network architecture by introducing an additional dimension called height beyond width and depth. Neural network architectures with height, width, and depth as hyper-parameters are called three-dimensional architectures. It is shown that neural networks with three-dimensional architectures are significantly more expressive than the ones with two-dimensional architectures (those with only width and depth as hyper-parameters), e.g., standard fully connected networks. The new network architecture is constructed recursively via a nested structure, and hence we call a network with the new architecture nested network (NestNet). A NestNet of height $s$ is built with each hidden neuron activated by a NestNet of height $\le s-1$. When $s=1$, a NestNet degenerates to a standard network with a two-dimensional architecture. It is proved by construction that height-$s$ ReLU NestNets with $\mathcal{O}(n)$ parameters can approximate $1$-Lipschitz continuous functions on $[0,1]^d$ with an error $\mathcal{O}(n^{-(s+1)/d})$, while the optimal approximation error of standard ReLU networks with $\mathcal{O}(n)$ parameters is $\mathcal{O}(n^{-2/d})$. Furthermore, such a result is extended to generic continuous functions on $[0,1]^d$ with the approximation error characterized by the modulus of continuity. Finally, we use numerical experimentation to show the advantages of the super-approximation power of ReLU NestNets.

NAJun 21, 2022
Finite Expression Method for Solving High-Dimensional Partial Differential Equations

Senwei Liang, Haizhao Yang

Designing efficient and accurate numerical solvers for high-dimensional partial differential equations (PDEs) remains a challenging and important topic in computational science and engineering, mainly due to the "curse of dimensionality" in designing numerical schemes that scale in dimension. This paper introduces a new methodology that seeks an approximate PDE solution in the space of functions with finitely many analytic expressions and, hence, this methodology is named the finite expression method (FEX). It is proved in approximation theory that FEX can avoid the curse of dimensionality. As a proof of concept, a deep reinforcement learning method is proposed to implement FEX for various high-dimensional PDEs in different dimensions, achieving high and even machine accuracy with a memory complexity polynomial in dimension and an amenable time complexity. An approximate solution with finite analytic expressions also provides interpretable insights into the ground truth PDE solution, which can further help to advance the understanding of physical systems and design postprocessing techniques for a refined solution.

LGJan 28, 2023
Deep Operator Learning Lessens the Curse of Dimensionality for PDEs

Ke Chen, Chunmei Wang, Haizhao Yang

Deep neural networks (DNNs) have achieved remarkable success in numerous domains, and their application to PDE-related problems has been rapidly advancing. This paper provides an estimate for the generalization error of learning Lipschitz operators over Banach spaces using DNNs with applications to various PDE solution operators. The goal is to specify DNN width, depth, and the number of training samples needed to guarantee a certain testing error. Under mild assumptions on data distributions or operator structures, our analysis shows that deep operator learning can have a relaxed dependence on the discretization resolution of PDEs and, hence, lessen the curse of dimensionality in many PDE-related problems including elliptic equations, parabolic equations, and Burgers equations. Our results are also applied to give insights about discretization-invariance in operator learning.

LGMar 10, 2022
IAE-Net: Integral Autoencoders for Discretization-Invariant Learning

Yong Zheng Ong, Zuowei Shen, Haizhao Yang

Discretization invariant learning aims at learning in the infinite-dimensional function spaces with the capacity to process heterogeneous discrete representations of functions as inputs and/or outputs of a learning model. This paper proposes a novel deep learning framework based on integral autoencoders (IAE-Net) for discretization invariant learning. The basic building block of IAE-Net consists of an encoder and a decoder as integral transforms with data-driven kernels, and a fully connected neural network between the encoder and decoder. This basic building block is applied in parallel in a wide multi-channel structure, which are repeatedly composed to form a deep and densely connected neural network with skip connections as IAE-Net. IAE-Net is trained with randomized data augmentation that generates training data with heterogeneous structures to facilitate the performance of discretization invariant learning. The proposed IAE-Net is tested with various applications in predictive data science, solving forward and inverse problems in scientific computing, and signal/image processing. Compared with alternatives in the literature, IAE-Net achieves state-of-the-art performance in existing applications and creates a wide range of new applications.

NANov 28, 2016
Preconditioning orbital minimization method for planewave discretization

Jianfeng Lu, Haizhao Yang

We present an efficient preconditioner for the orbital minimization method when the Hamiltonian is discretized using planewaves (i.e., pseudospectral method). This novel preconditioner is based on an approximate Fermi operator projection by pole expansion, combined with the sparsifying preconditioner to efficiently evaluate the pole expansion for a wide range of Hamiltonian operators. Numerical results validate the performance of the new preconditioner for the orbital minimization method, in particular, the iteration number is reduced to $O(1)$ and often only a few iterations are enough for convergence.

OCJul 3, 2024
NewVEM: A Newton Vertex Exchange Method for a Class of Constrained Self-Concordant Minimization Problems

Ling Liang, Kim-Chuan Toh, Haizhao Yang

We propose \textbf{NewVEM}, a Newton vertex exchange method for efficiently solving self-concordant minimization problems under generalized simplex constraints. The algorithm features a two-level structure: the outer loop employs a projected Newton method, and the inner loop uses a vertex exchange approach to solve strongly convex quadratic subproblems. Both loops converge locally at linear rates under technical conditions, resulting in a ``fast $\times$ fast'' framework that demonstrates high efficiency and scalability in practice. To get a feasible initial point to execute the algorithm, we also present and analyze a highly efficient semismooth Newton method for computing the projection onto the generalized simplex. The excellent practical performance of the proposed algorithms is demonstrated by a set of numerical experiments. Our results further motivate the potential real-world applications of the considered model and the proposed algorithms.

CVJul 16, 2022
The Lottery Ticket Hypothesis for Self-attention in Convolutional Neural Network

Zhongzhan Huang, Senwei Liang, Mingfu Liang et al.

Recently many plug-and-play self-attention modules (SAMs) are proposed to enhance the model generalization by exploiting the internal information of deep convolutional neural networks (CNNs). In general, previous works ignore where to plug in the SAMs since they connect the SAMs individually with each block of the entire CNN backbone for granted, leading to incremental computational cost and the number of parameters with the growth of network depth. However, we empirically find and verify some counterintuitive phenomena that: (a) Connecting the SAMs to all the blocks may not always bring the largest performance boost, and connecting to partial blocks would be even better; (b) Adding the SAMs to a CNN may not always bring a performance boost, and instead it may even harm the performance of the original CNN backbone. Therefore, we articulate and demonstrate the Lottery Ticket Hypothesis for Self-attention Networks: a full self-attention network contains a subnetwork with sparse self-attention connections that can (1) accelerate inference, (2) reduce extra parameter increment, and (3) maintain accuracy. In addition to the empirical evidence, this hypothesis is also supported by our theoretical evidence. Furthermore, we propose a simple yet effective reinforcement-learning-based method to search the ticket, i.e., the connection scheme that satisfies the three above-mentioned conditions. Extensive experiments on widely-used benchmark datasets and popular self-attention networks show the effectiveness of our method. Besides, our experiments illustrate that our searched ticket has the capacity of transferring to some vision tasks, e.g., crowd counting and segmentation.

LGJun 8, 2022
Reinforced Inverse Scattering

Hanyang Jiang, Yuehaw Khoo, Haizhao Yang

Inverse wave scattering aims at determining the properties of an object using data on how the object scatters incoming waves. In order to collect information, sensors are put in different locations to send and receive waves from each other. The choice of sensor positions and incident wave frequencies determines the reconstruction quality of scatterer properties. This paper introduces reinforcement learning to develop precision imaging that decides sensor positions and wave frequencies adaptive to different scatterers in an intelligent way, thus obtaining a significant improvement in reconstruction quality with limited imaging resources. Extensive numerical results will be provided to demonstrate the superiority of the proposed method over existing methods.

NAJun 21, 2023
A Finite Expression Method for Solving High-Dimensional Committor Problems

Zezheng Song, Maria K. Cameron, Haizhao Yang

Transition path theory (TPT) is a mathematical framework for quantifying rare transition events between a pair of selected metastable states $A$ and $B$. Central to TPT is the committor function, which describes the probability to hit the metastable state $B$ prior to $A$ from any given starting point of the phase space. Once the committor is computed, the transition channels and the transition rate can be readily found. The committor is the solution to the backward Kolmogorov equation with appropriate boundary conditions. However, solving it is a challenging task in high dimensions due to the need to mesh a whole region of the ambient space. In this work, we explore the finite expression method (FEX, Liang and Yang (2022)) as a tool for computing the committor. FEX approximates the committor by an algebraic expression involving a fixed finite number of nonlinear functions and binary arithmetic operations. The optimal nonlinear functions, the binary operations, and the numerical coefficients in the expression template are found via reinforcement learning. The FEX-based committor solver is tested on several high-dimensional benchmark problems. It gives comparable or better results than neural network-based solvers. Most importantly, FEX is capable of correctly identifying the algebraic structure of the solution which allows one to reduce the committor problem to a low-dimensional one and find the committor with any desired accuracy.

NAFeb 5, 2023
Convergence Analysis of the Deep Galerkin Method for Weak Solutions

Yuling Jiao, Yanming Lai, Yang Wang et al.

This paper analyzes the convergence rate of a deep Galerkin method for the weak solution (DGMW) of second-order elliptic partial differential equations on $\mathbb{R}^d$ with Dirichlet, Neumann, and Robin boundary conditions, respectively. In DGMW, a deep neural network is applied to parametrize the PDE solution, and a second neural network is adopted to parametrize the test function in the traditional Galerkin formulation. By properly choosing the depth and width of these two networks in terms of the number of training samples $n$, it is shown that the convergence rate of DGMW is $\mathcal{O}(n^{-1/d})$, which is the first convergence result for weak solutions. The main idea of the proof is to divide the error of the DGMW into an approximation error and a statistical error. We derive an upper bound on the approximation error in the $H^{1}$ norm and bound the statistical error via Rademacher complexity.

LGDec 8, 2022
A Distributed Block Chebyshev-Davidson Algorithm for Parallel Spectral Clustering

Qiyuan Pang, Haizhao Yang

We develop a distributed Block Chebyshev-Davidson algorithm to solve large-scale leading eigenvalue problems for spectral analysis in spectral clustering. First, the efficiency of the Chebyshev-Davidson algorithm relies on the prior knowledge of the eigenvalue spectrum, which could be expensive to estimate. This issue can be lessened by the analytic spectrum estimation of the Laplacian or normalized Laplacian matrices in spectral clustering, making the proposed algorithm very efficient for spectral clustering. Second, to make the proposed algorithm capable of analyzing big data, a distributed and parallel version has been developed with attractive scalability. The speedup by parallel computing is approximately equivalent to $\sqrt{p}$, where $p$ denotes the number of processes. {Numerical results will be provided to demonstrate its efficiency in spectral clustering and scalability advantage over existing eigensolvers used for spectral clustering in parallel computing environments.}

NAOct 14, 2017
Diffusion forecasting model with basis functions from QR decomposition

John Harlim, Haizhao Yang

The diffusion forecasting is a nonparametric approach that provably solves the Fokker-Planck PDE corresponding to Itô diffusion without knowing the underlying equation. The key idea of this method is to approximate the solution of the Fokker-Planck equation with a discrete representation of the shift (Koopman) operator on a set of basis functions generated via the diffusion maps algorithm. While the choice of these basis functions is provably optimal under appropriate conditions, computing these basis functions is quite expensive since it requires the eigendecomposition of an $N\times N$ diffusion matrix, where $N$ denotes the data size and could be very large. For large-scale forecasting problems, only a few leading eigenvectors are computationally achievable. To overcome this computational bottleneck, a new set of basis functions constructed by orthonormalizing selected columns of the diffusion matrix and its leading eigenvectors is proposed. This computation can be carried out efficiently via the unpivoted Householder QR factorization. The efficiency and effectiveness of the proposed algorithm will be shown in both deterministically chaotic and stochastic dynamical systems; in the former case, the superiority of the proposed basis functions over purely eigenvectors is significant, while in the latter case forecasting accuracy is improved relative to using a purely small number of eigenvectors. Supporting arguments will be provided on three- and six-dimensional chaotic ODEs, a three-dimensional SDE that mimics turbulent systems, and also on the two spatial modes associated with the boreal winter Madden-Julian Oscillation obtained from applying the Nonlinear Laplacian Spectral Analysis on the measured Outgoing Longwave Radiation (OLR).

CEAug 7, 2022
On Fast Simulation of Dynamical System with Neural Vector Enhanced Numerical Solver

Zhongzhan Huang, Senwei Liang, Hong Zhang et al.

The large-scale simulation of dynamical systems is critical in numerous scientific and engineering disciplines. However, traditional numerical solvers are limited by the choice of step sizes when estimating integration, resulting in a trade-off between accuracy and computational efficiency. To address this challenge, we introduce a deep learning-based corrector called Neural Vector (NeurVec), which can compensate for integration errors and enable larger time step sizes in simulations. Our extensive experiments on a variety of complex dynamical system benchmarks demonstrate that NeurVec exhibits remarkable generalization capability on a continuous phase space, even when trained using limited and discrete data. NeurVec significantly accelerates traditional solvers, achieving speeds tens to hundreds of times faster while maintaining high levels of accuracy and stability. Moreover, NeurVec's simple-yet-effective design, combined with its ease of implementation, has the potential to establish a new paradigm for fast-solving differential equations based on deep learning.

NAMar 23
Quantum Circuit Encodings of Polynomial Chaos Expansions

Junaid Aftab, Christoph Schwab, Haizhao Yang et al.

This work investigates the expressive power of quantum circuits in approximating high-dimensional, real-valued functions. We focus on countably-parametric holomorphic maps $u:U\to \mathbb{R}$, where the parameter domain is $U=[-1,1]^{\mathbb{N}}$. We establish dimension-independent quantum circuit approximation rates via the best $n$-term truncations of generalized polynomial chaos (gPC) expansions of these parametric maps, demonstrating that these rates depend solely on the summability exponent of the gPC expansion coefficients. The key to our findings is based on the fact that so-called ``$(\boldsymbol{b},ε)$-holomorphic'' functions, where $\boldsymbol{b}\in (0,1]^\mathbb N \cap \ell^p(\mathbb N)$ for some $p\in(0,1)$, permit structured and sparse gPC expansions. Then, $n$-term truncated gPC expansions are known to admit approximation rates of order $n^{-1/p + 1/2}$ in the $L^2$ norm and of order $n^{-1/p + 1}$ in the $L^\infty$ norm. We show the existence of parameterized quantum circuit (PQC) encodings of these $n$-term truncated gPC expansions, and bound PQC depth and width via (i) tensorization of univariate PQCs that encode Chebyšev-polynomials in $[-1,1]$ and (ii) linear combination of unitaries (LCU) to build PQC emulations of $n$-term truncated gPC expansions. The results provide a rigorous mathematical foundation for the use of quantum algorithms in high-dimensional function approximation. As countably-parametric holomorphic maps naturally arise in parametric PDE models and uncertainty quantification (UQ), our results have implications for quantum-enhanced algorithms for a wide range of maps in applications.

LGApr 23, 2024Code
FMint: Bridging Human Designed and Data Pretrained Models for Differential Equation Foundation Model

Zezheng Song, Jiaxin Yuan, Haizhao Yang

The fast simulation of dynamical systems is a key challenge in many scientific and engineering applications, such as weather forecasting, disease control, and drug discovery. With the recent success of deep learning, there is increasing interest in using neural networks to solve differential equations in a data-driven manner. However, existing methods are either limited to specific types of differential equations or require large amounts of data for training. This restricts their practicality in many real-world applications, where data is often scarce or expensive to obtain. To address this, we propose a novel multi-modal foundation model, named \textbf{FMint} (\textbf{F}oundation \textbf{M}odel based on \textbf{In}i\textbf{t}ialization), to bridge the gap between human-designed and data-driven models for the fast simulation of dynamical systems. Built on a decoder-only transformer architecture with in-context learning, FMint utilizes both numerical and textual data to learn a universal error correction scheme for dynamical systems, using prompted sequences of coarse solutions from traditional solvers. The model is pre-trained on a corpus of 40K ODEs, and we perform extensive experiments on challenging ODEs that exhibit chaotic behavior and of high dimensionality. Our results demonstrate the effectiveness of the proposed model in terms of both accuracy and efficiency compared to classical numerical solvers, highlighting FMint's potential as a general-purpose solver for dynamical systems. Our approach achieves an accuracy improvement of 1 to 2 orders of magnitude over state-of-the-art dynamical system simulators, and delivers a 5X speedup compared to traditional numerical algorithms. The code for FMint is available at \url{https://github.com/margotyjx/FMint}.

QUANT-PHMar 17
Non-Uniform Quantum Fourier Transform

Junaid Aftab, Yuehaw Khoo, Haizhao Yang

The Discrete Fourier Transform (DFT) is central to the analysis of uniformly sampled signals, yet many practical applications involve non-uniform sampling, requiring the Non-Uniform Discrete Fourier Transform (NUDFT). While quantum algorithms for the standard DFT are well established, a corresponding framework for the non-uniform case remains underdeveloped. This work introduces a quantum algorithm for the Non-Uniform Quantum Fourier Transform (NUQFT) based on a low-rank factorization of the NUDFT matrix. The factorization is translated into an explicit quantum construction using block encodings, Quantum Signal Processing, and the Linear Combination of Unitaries framework, yielding an $ε$-accurate block encoding of the NUDFT matrix with controlled approximation error from both classical truncation and quantum implementation. Under standard oracle access assumptions for non-uniform sampling points, we derive explicit, non-asymptotic gate-level resource estimates. The resulting complexity scales polylogarithmically with target precision, quadratically with the number of qubits through the quantum Fourier transform, and logarithmically with a geometry-dependent conditioning parameter induced by the non-uniform grid. This establishes a concrete and resource-efficient quantum analogue of the NUDFT and provides a foundation for quantum algorithms on irregularly sampled data.

NAApr 24
Finite Expression Method with TranNet-based Function Learning for High-Dimensional Partial Differential Equations

Phuoc-Toan Huynh, Feng Bao, Haizhao Yang et al.

In this paper, we study a machine-learning-based solver for high-dimensional partial differential equations (PDEs). Computing accurate solutions efficiently for such problems remains challenging because of the curse of dimensionality, which severely limits the scalability of classical numerical methods. Our approach builds on the recently developed finite expression method (FEX), which approximates PDE solutions in a function space generated by finitely many analytic expressions. This framework has been shown to achieve high, and in some cases machine-level, accuracy with polynomial memory complexity and favorable computational cost. We propose an extension of FEX in which the functional pool is generated by shallow neural network operators whose parameters are initialized using the transferable neural network method TransNet. Numerical experiments suggest that the proposed extension is an effective alternative for solving several high-dimensional PDEs.

LGSep 4, 2024
Optimal Neural Network Approximation for High-Dimensional Continuous Functions

Ayan Maiti, Michelle Michelle, Haizhao Yang

Recently, the authors of \cite{SYZ22} developed a neural network with width $36d(2d + 1)$ and depth $11$, which utilizes a special activation function called the elementary universal activation function, to achieve the super approximation property for functions in $C([a,b]^d)$. That is, the constructed network only requires a fixed number of neurons (and thus parameters) to approximate a $d$-variate continuous function on a $d$-dimensional hypercube with arbitrary accuracy. More specifically, only $\mathcal{O}(d^2)$ neurons or parameters are used. One natural question is whether we can reduce the number of these neurons or parameters in such a network. By leveraging a variant of the Kolmogorov Superposition Theorem, \textcolor{black}{we show that there is a composition of networks generated by the elementary universal activation function with at most $10889d + 10887$ nonzero parameters such that this super approximation property is attained. The composed network consists of repeated evaluations of two neural networks: one with width $36(2d+1)$ and the other with width 36, both having 5 layers.} Furthermore, we present a family of continuous functions that requires at least width $d$, and thus at least $d$ neurons or parameters, to achieve arbitrary accuracy in its approximation. This suggests that the number of nonzero parameters is optimal in the sense that it grows linearly with the input dimension $d$, unlike some approximation methods where parameters may grow exponentially with $d$.

LGMar 31
Principal Prototype Analysis on Manifold for Interpretable Reinforcement Learning

Bodla Krishna Vamshi, Haizhao Yang

Recent years have witnessed the widespread adoption of reinforcement learning (RL), from solving real-time games to fine-tuning large language models using human preference data significantly improving alignment with user expectations. However, as model complexity grows exponentially, the interpretability of these systems becomes increasingly challenging. While numerous explainability methods have been developed for computer vision and natural language processing to elucidate both local and global reasoning patterns, their application to RL remains limited. Direct extensions of these methods often struggle to maintain the delicate balance between interpretability and performance within RL settings. Prototype-Wrapper Networks (PW-Nets) have recently shown promise in bridging this gap by enhancing explainability in RL domains without sacrificing the efficiency of the original black-box models. However, these methods typically require manually defined reference prototypes, which often necessitate expert domain knowledge. In this work, we propose a method that removes this dependency by automatically selecting optimal prototypes from the available data. Preliminary experiments on standard Gym environments demonstrate that our approach matches the performance of existing PW-Nets, while remaining competitive with the original black-box models.

CVNov 28, 2020Code
Efficient Attention Network: Accelerate Attention by Searching Where to Plug

Zhongzhan Huang, Senwei Liang, Mingfu Liang et al.

Recently, many plug-and-play self-attention modules are proposed to enhance the model generalization by exploiting the internal information of deep convolutional neural networks (CNNs). Previous works lay an emphasis on the design of attention module for specific functionality, e.g., light-weighted or task-oriented attention. However, they ignore the importance of where to plug in the attention module since they connect the modules individually with each block of the entire CNN backbone for granted, leading to incremental computational cost and number of parameters with the growth of network depth. Thus, we propose a framework called Efficient Attention Network (EAN) to improve the efficiency for the existing attention modules. In EAN, we leverage the sharing mechanism (Huang et al. 2020) to share the attention module within the backbone and search where to connect the shared attention module via reinforcement learning. Finally, we obtain the attention network with sparse connections between the backbone and modules, while (1) maintaining accuracy (2) reducing extra parameter increment and (3) accelerating inference. Extensive experiments on widely-used benchmarks and popular attention networks show the effectiveness of EAN. Furthermore, we empirically illustrate that our EAN has the capacity of transferring to other tasks and capturing the informative features. The code is available at https://github.com/gbup-group/EAN-efficient-attention-network.

CVMay 25, 2019Code
DIANet: Dense-and-Implicit Attention Network

Zhongzhan Huang, Senwei Liang, Mingfu Liang et al.

Attention networks have successfully boosted the performance in various vision problems. Previous works lay emphasis on designing a new attention module and individually plug them into the networks. Our paper proposes a novel-and-simple framework that shares an attention module throughout different network layers to encourage the integration of layer-wise information and this parameter-sharing module is referred as Dense-and-Implicit-Attention (DIA) unit. Many choices of modules can be used in the DIA unit. Since Long Short Term Memory (LSTM) has a capacity of capturing long-distance dependency, we focus on the case when the DIA unit is the modified LSTM (refer as DIA-LSTM). Experiments on benchmark datasets show that the DIA-LSTM unit is capable of emphasizing layer-wise feature interrelation and leads to significant improvement of image classification accuracy. We further empirically show that the DIA-LSTM has a strong regularization ability on stabilizing the training of deep networks by the experiments with the removal of skip connections or Batch Normalization in the whole residual network. The code is released at https://github.com/gbup-group/DIANet.

LGNov 14, 2018Code
Drop-Activation: Implicit Parameter Reduction and Harmonic Regularization

Senwei Liang, Yuehaw Khoo, Haizhao Yang

Overfitting frequently occurs in deep learning. In this paper, we propose a novel regularization method called Drop-Activation to reduce overfitting and improve generalization. The key idea is to drop nonlinear activation functions by setting them to be identity functions randomly during training time. During testing, we use a deterministic network with a new activation function to encode the average effect of dropping activations randomly. Our theoretical analyses support the regularization effect of Drop-Activation as implicit parameter reduction and verify its capability to be used together with Batch Normalization (Ioffe and Szegedy 2015). The experimental results on CIFAR-10, CIFAR-100, SVHN, EMNIST, and ImageNet show that Drop-Activation generally improves the performance of popular neural network architectures for the image classification task. Furthermore, as a regularizer Drop-Activation can be used in harmony with standard training and regularization techniques such as Batch Normalization and Auto Augment (Cubuk et al. 2019). The code is available at \url{https://github.com/LeungSamWai/Drop-Activation}.

OCSep 25, 2024
Accelerating Multi-Block Constrained Optimization Through Learning to Optimize

Ling Liang, Cameron Austin, Haizhao Yang

Learning to Optimize (L2O) approaches, including algorithm unrolling, plug-and-play methods, and hyperparameter learning, have garnered significant attention and have been successfully applied to the Alternating Direction Method of Multipliers (ADMM) and its variants. However, the natural extension of L2O to multi-block ADMM-type methods remains largely unexplored. Such an extension is critical, as multi-block methods leverage the separable structure of optimization problems, offering substantial reductions in per-iteration complexity. Given that classical multi-block ADMM does not guarantee convergence, the Majorized Proximal Augmented Lagrangian Method (MPALM), which shares a similar form with multi-block ADMM and ensures convergence, is more suitable in this setting. Despite its theoretical advantages, MPALM's performance is highly sensitive to the choice of penalty parameters. To address this limitation, we propose a novel L2O approach that adaptively selects this hyperparameter using supervised learning. We demonstrate the versatility and effectiveness of our method by applying it to the Lasso problem and the optimal transport problem. Our numerical results show that the proposed framework outperforms popular alternatives. Given its applicability to generic linearly constrained composite optimization problems, this work opens the door to a wide range of potential real-world applications.

MLApr 23
Beyond Expected Information Gain: Stable Bayesian Optimal Experimental Design with Integral Probability Metrics and Plug-and-Play Extensions

Di Wu, Ling Liang, Haizhao Yang

Bayesian Optimal Experimental Design (BOED) provides a rigorous framework for decision-making tasks in which data acquisition is often the critical bottleneck, especially in resource-constrained settings. Traditionally, BOED typically selects designs by maximizing expected information gain (EIG), commonly defined through the Kullback-Leibler (KL) divergence. However, classical evaluation of EIG often involves challenging nested expectations, and even advanced variational methods leave the underlying log-density-ratio objective unchanged. As a result, support mismatch, tail underestimation, and rare-event sensitivity remain intrinsic concerns for KL-based BOED. To address these fundamental bottlenecks, we introduce an IPM-based BOED framework that replaces density-based divergences with integral probability metrics (IPMs), including the Wasserstein distance, Maximum Mean Discrepancy, and Energy Distance, resulting in a highly flexible plug-and-play BOED framework. We establish theoretical guarantees showing that IPM-based utilities provide stronger geometry-aware stability under surrogate-model error and prior misspecification than classical EIG-based utilities. We also validate the proposed framework empirically, demonstrating that IPM-based designs yield highly concentrated credible sets. Furthermore, by extending the same sample-based BOED template in a plug-and-play manner to geometry-aware discrepancies beyond the IPM class, illustrated by a neural optimal transport estimator, we achieve accurate optimal designs in high-dimensional settings where conventional nested Monte Carlo estimators and advanced variational methods fail.

CLJan 20
HALT: Hallucination Assessment via Latent Testing

Rohan Bhatnagar, Youran Sun, Chi Andrew Zhang et al.

Hallucination in large language models (LLMs) can be understood as a failure of faithful readout: although internal representations may encode uncertainty about a query, decoding pressures still yield a fluent answer. We propose lightweight residual probes that read hallucination risk directly from intermediate hidden states of question tokens, motivated by the hypothesis that these layers retain epistemic signals that are attenuated in the final decoding stage. The probe is a small auxiliary network whose computation is orders of magnitude cheaper than token generation and can be evaluated fully in parallel with inference, enabling near-instantaneous hallucination risk estimation with effectively zero added latency in low-risk cases. We deploy the probe as an agentic critic for fast selective generation and routing, allowing LLMs to immediately answer confident queries while delegating uncertain ones to stronger verification pipelines. Across four QA benchmarks and multiple LLM families, the method achieves strong AUROC and AURAC, generalizes under dataset shift, and reveals interpretable structure in intermediate representations, positioning fast internal uncertainty readout as a principled foundation for reliable agentic AI.

LGJan 8
Manifold-based Sampling for In-Context Hallucination Detection in Large Language Models

Bodla Krishna Vamshi, Rohan Bhatnagar, Haizhao Yang

Large language models (LLMs) frequently generate factually incorrect or unsupported content, commonly referred to as hallucinations. Prior work has explored decoding strategies, retrieval augmentation, and supervised fine-tuning for hallucination detection, while recent studies show that in-context learning (ICL) can substantially influence factual reliability. However, existing ICL demonstration selection methods often rely on surface-level similarity heuristics and exhibit limited robustness across tasks and models. We propose MB-ICL, a manifold-based demonstration sampling framework for selecting in-context demonstrations that leverages latent representations extracted from frozen LLMs. By jointly modeling local manifold structure and class-aware prototype geometry, MB-ICL selects demonstrations based on their proximity to learned prototypes rather than lexical or embedding similarity alone. Across factual verification (FEVER) and hallucination detection (HaluEval) benchmarks, MB-ICL outperforms standard ICL selection baselines in the majority of evaluated settings, with particularly strong gains on dialogue and summarization tasks. The method remains robust under temperature perturbations and model variation, indicating improved stability compared to heuristic retrieval strategies. While lexical retrieval can remain competitive in certain question-answering regimes, our results demonstrate that manifold-based prototype selection provides a reliable and training light approach for hallucination detection without modifying LLM parameters, offering a principled direction for improved ICL demonstration selection.

CLApr 23, 2025
OptimAI: Optimization from Natural Language Using LLM-Powered AI Agents

Raghav Thind, Youran Sun, Ling Liang et al.

Optimization plays a vital role in scientific research and practical applications. However, formulating a concrete optimization problem described in natural language into a mathematical form and selecting a suitable solver to solve the problem requires substantial domain expertise. We introduce OptimAI, a framework for solving Optimization problems described in natural language by leveraging LLM-powered AI agents, and achieve superior performance over current state-of-the-art methods. Our framework is built upon the following key roles: (1) a formulator that translates natural language problem descriptions into precise mathematical formulations; (2) a planner that constructs a high-level solution strategy prior to execution; and (3) a coder and a code critic capable of interacting with the environment and reflecting on outcomes to refine future actions. Ablation studies confirm that all roles are essential; removing the planner or code critic results in $5.8\times$ and $3.1\times$ drops in productivity, respectively. Furthermore, we introduce UCB-based debug scheduling to dynamically switch between alternative plans, yielding an additional $3.3\times$ productivity gain. Our design emphasizes multi-agent collaboration, and our experiments confirm that combining diverse models leads to performance gains. Our approach attains 88.1% accuracy on the NLP4LP dataset and 82.3% on the Optibench dataset, reducing error rates by 58% and 52%, respectively, over prior best results.

CEOct 31, 2025
FMint-SDE: A Multimodal Foundation Model for Accelerating Numerical Simulation of SDEs via Error Correction

Jiaxin Yuan, Haizhao Yang, Maria Cameron

Fast and accurate simulation of dynamical systems is a fundamental challenge across scientific and engineering domains. Traditional numerical integrators often face a trade-off between accuracy and computational efficiency, while existing neural network-based approaches typically require training a separate model for each case. To overcome these limitations, we introduce a novel multi-modal foundation model for large-scale simulations of differential equations: FMint-SDE (Foundation Model based on Initialization for stochastic differential equations). Based on a decoder-only transformer with in-context learning, FMint-SDE leverages numerical and textual modalities to learn a universal error-correction scheme. It is trained using prompted sequences of coarse solutions generated by conventional solvers, enabling broad generalization across diverse systems. We evaluate our models on a suite of challenging SDE benchmarks spanning applications in molecular dynamics, mechanical systems, finance, and biology. Experimental results show that our approach achieves a superior accuracy-efficiency tradeoff compared to classical solvers, underscoring the potential of FMint-SDE as a general-purpose simulation tool for dynamical systems.

LGMar 13, 2025
From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs

Rohan Bhatnagar, Ling Liang, Krish Patel et al.

Motivated by the remarkable success of artificial intelligence (AI) across diverse fields, the application of AI to solve scientific problems, often formulated as partial differential equations (PDEs), has garnered increasing attention. While most existing research concentrates on theoretical properties (such as well-posedness, regularity, and continuity) of the solutions, alongside direct AI-driven methods for solving PDEs, the challenge of uncovering symbolic relationships within these equations remains largely unexplored. In this paper, we propose leveraging large language models (LLMs) to learn such symbolic relationships. Our results demonstrate that LLMs can effectively predict the operators involved in PDE solutions by utilizing the symbolic information in the PDEs both theoretically and numerically. Furthermore, we show that discovering these symbolic relationships can substantially improve both the efficiency and accuracy of symbolic machine learning for finding analytical approximation of PDE solutions, delivering a fully interpretable solution pipeline. This work opens new avenues for understanding the symbolic structure of scientific problems and advancing their solution processes.

CLApr 17, 2025
LLMs Meet Finance: Fine-Tuning Foundation Models for the Open FinLLM Leaderboard

Varun Rao, Youran Sun, Mahendra Kumar et al.

This paper investigates the application of large language models (LLMs) to financial tasks. We fine-tuned foundation models using the Open FinLLM Leaderboard as a benchmark. Building on Qwen2.5 and Deepseek-R1, we employed techniques including supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning (RL) to enhance their financial capabilities. The fine-tuned models demonstrated substantial performance gains across a wide range of financial tasks. Moreover, we measured the data scaling law in the financial domain. Our work demonstrates the potential of large language models (LLMs) in financial applications.

LGJan 23, 2024
On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization

Ling Liang, Haizhao Yang

We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL). In order to solve such an optimization problem, we apply and analyze the classical stochastic proximal gradient method. In particular, the method has shown to admit an $O(ε^{-4})$ sample complexity to an $ε$-stationary point, under standard conditions. Since the variance of the classical stochastic gradient estimator is typically large, which slows down the convergence, we also apply an efficient stochastic variance-reduce proximal gradient method with an importance sampling based ProbAbilistic Gradient Estimator (PAGE). Our analysis shows that the sample complexity can be improved from $O(ε^{-4})$ to $O(ε^{-3})$ under additional conditions. Our results on the stochastic (variance-reduced) proximal gradient method match the sample complexity of their most competitive counterparts for discounted Markov decision processes under similar settings. To the best of our knowledge, the proposed methods represent a novel approach in addressing the general regularized reward optimization problem.

LGFeb 7, 2025
Curse of Dimensionality in Neural Network Optimization

Sanghoon Na, Haizhao Yang

This paper demonstrates that when a shallow neural network with a Lipschitz continuous activation function is trained using either empirical or population risk to approximate a target function that is $r$ times continuously differentiable on $[0,1]^d$, the population risk may not decay at a rate faster than $t^{-\frac{4r}{d-2r}}$, where $t$ is an analog of the total number of optimization iterations. This result highlights the presence of the curse of dimensionality in the optimization computation required to achieve a desired accuracy. Instead of analyzing parameter evolution directly, the training dynamics are examined through the evolution of the parameter distribution under the 2-Wasserstein gradient flow. Furthermore, it is established that the curse of dimensionality persists when a locally Lipschitz continuous activation function is employed, where the Lipschitz constant in $[-x,x]$ is bounded by $O(x^δ)$ for any $x \in \mathbb{R}$. In this scenario, the population risk is shown to decay at a rate no faster than $t^{-\frac{(4+2δ)r}{d-2r}}$. Understanding how function smoothness influences the curse of dimensionality in neural network optimization theory is an important and underexplored direction that this work aims to address.

LGFeb 6, 2025
PINS: Proximal Iterations with Sparse Newton and Sinkhorn for Optimal Transport

Di Wu, Ling Liang, Haizhao Yang

Optimal transport (OT) is a critical problem in optimization and machine learning, where accuracy and efficiency are paramount. Although entropic regularization and the Sinkhorn algorithm improve scalability, they frequently encounter numerical instability and slow convergence, especially when the regularization parameter is small. In this work, we introduce Proximal Iterations with Sparse Newton and Sinkhorn methods (PINS) to efficiently compute highly accurate solutions for large-scale OT problems. A reduced computational complexity through overall sparsity and global convergence are guaranteed by rigorous theoretical analysis. Our approach offers three key advantages: it achieves accuracy comparable to exact solutions, progressively accelerates each iteration for greater efficiency, and enhances robustness by reducing sensitivity to regularization parameters. Extensive experiments confirm these advantages, demonstrating superior performance compared to related methods.

LGDec 19, 2023
Neural Network Approximation for Pessimistic Offline Reinforcement Learning

Di Wu, Yuling Jiao, Li Shen et al.

Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-making scenarios, yet its theoretical guarantees are still under development. Existing works on offline RL theory primarily emphasize a few trivial settings, such as linear MDP or general function approximation with strong assumptions and independent data, which lack guidance for practical use. The coupling of deep learning and Bellman residuals makes this problem challenging, in addition to the difficulty of data dependence. In this paper, we establish a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation with $\mathcal{C}$-mixing data regarding the structure of networks, the dimension of datasets, and the concentrability of data coverage, under mild assumptions. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight. This result demonstrates the explicit efficiency of deep adversarial offline RL frameworks. We utilize the empirical process tool for $\mathcal{C}$-mixing sequences and the neural network approximation theory for the Hölder class to achieve this. We also develop methods to bound the Bellman estimation error caused by function approximation with empirical Bellman constraint perturbations. Additionally, we present a result that lessens the curse of dimensionality using data with low intrinsic dimensionality and function classes with low complexity. Our estimation provides valuable insights into the development of deep offline RL and guidance for algorithm model design.

AIFeb 19
AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

Jianda Du, Youran Sun, Haizhao Yang

PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-based approaches improve flexibility but often demand high computational cost and suffer from limited interpretability. We introduce \texttt{AutoNumerics}, a multi-agent framework that autonomously designs, implements, debugs, and verifies numerical solvers for general PDEs directly from natural language descriptions. Unlike black-box neural solvers, our framework generates transparent solvers grounded in classical numerical analysis. We introduce a coarse-to-fine execution strategy and a residual-based self-verification mechanism. Experiments on 24 canonical and real-world PDE problems demonstrate that \texttt{AutoNumerics} achieves competitive or superior accuracy compared to existing neural and LLM-based baselines, and correctly selects numerical schemes based on PDE structural properties, suggesting its viability as an accessible paradigm for automated PDE solving.

NAOct 26, 2025
Multi-Scale Finite Expression Method for PDEs with Oscillatory Solutions on Complex Domains

Gareth Hardwick, Haizhao Yang

Solving partial differential equations (PDEs) with highly oscillatory solutions on complex domains remains a challenging and important problem. High-frequency oscillations and intricate geometries often result in prohibitively expensive representations for traditional numerical methods and lead to difficult optimization landscapes for machine learning-based approaches. In this work, we introduce an enhanced Finite Expression Method (FEX) designed to address these challenges with improved accuracy, interpretability, and computational efficiency. The proposed framework incorporates three key innovations: a symbolic spectral composition module that enables FEX to learn and represent multiscale oscillatory behavior; a redesigned linear input layer that significantly expands the expressivity of the model; and an eigenvalue formulation that extends FEX to a new class of problems involving eigenvalue PDEs. Through extensive numerical experiments, we demonstrate that FEX accurately resolves oscillatory PDEs on domains containing multiple holes of varying shapes and sizes. Compared with existing neural network-based solvers, FEX achieves substantially higher accuracy while yielding interpretable, closed-form solutions that expose the underlying structure of the problem. These advantages, often absent in conventional finite element, finite difference, and black-box neural approaches, highlight FEX as a powerful and transparent framework for solving complex PDEs.

SESep 27, 2025
Protocode: Prototype-Driven Interpretability for Code Generation in LLMs

Krishna Vamshi Bodla, Haizhao Yang

Since the introduction of Large Language Models (LLMs), they have been widely adopted for various tasks such as text summarization, question answering, speech-to-text translation, and more. In recent times, the use of LLMs for code generation has gained significant attention, with tools such as Cursor and Windsurf demonstrating the ability to analyze massive code repositories and recommend relevant changes. Big tech companies have also acknowledged the growing reliance on LLMs for code generation within their codebases. Although these advances significantly improve developer productivity, increasing reliance on automated code generation can proportionally increase the risk of suboptimal solutions and insecure code. Our work focuses on automatically sampling In-Context Learning (ICL) demonstrations which can improve model performance and enhance the interpretability of the generated code. Using AST-based analysis on outputs from the MBPP test set, we identify regions of code most influenced by the chosen demonstrations. In our experiments, we show that high-quality ICL demonstrations not only make outputs easier to interpret but also yield a positive performance improvement on the pass@10 metric. Conversely, poorly chosen ICL demonstrations affected the LLM performance on the pass@10 metric negatively compared to the base model. Overall, our approach highlights the importance of efficient sampling strategies for ICL, which can affect the performance of the model on any given task.

LGJul 2, 2025
Efficient Kilometer-Scale Precipitation Downscaling with Conditional Wavelet Diffusion

Chugang Yi, Minghan Yu, Weikang Qian et al.

Effective hydrological modeling and extreme weather analysis demand precipitation data at a kilometer-scale resolution, which is significantly finer than the 10 km scale offered by standard global products like IMERG. To address this, we propose the Wavelet Diffusion Model (WDM), a generative framework that achieves 10x spatial super-resolution (downscaling to 1 km) and delivers a 9x inference speedup over pixel-based diffusion models. WDM is a conditional diffusion model that learns the learns the complex structure of precipitation from MRMS radar data directly in the wavelet domain. By focusing on high-frequency wavelet coefficients, it generates exceptionally realistic and detailed 1-km precipitation fields. This wavelet-based approach produces visually superior results with fewer artifacts than pixel-space models, and delivers a significant gains in sampling efficiency. Our results demonstrate that WDM provides a robust solution to the dual challenges of accuracy and speed in geoscience super-resolution, paving the way for more reliable hydrological forecasts.

LGMar 12, 2025
Parsing the Language of Expression: Enhancing Symbolic Regression with Domain-Aware Symbolic Priors

Sikai Huang, Yixin Berry Wen, Tara Adusumilli et al.

Symbolic regression is essential for deriving interpretable expressions that elucidate complex phenomena by exposing the underlying mathematical and physical relationships in data. In this paper, we present an advanced symbolic regression method that integrates symbol priors from diverse scientific domains - including physics, biology, chemistry, and engineering - into the regression process. By systematically analyzing domain-specific expressions, we derive probability distributions of symbols to guide expression generation. We propose novel tree-structured recurrent neural networks (RNNs) that leverage these symbol priors, enabling domain knowledge to steer the learning process. Additionally, we introduce a hierarchical tree structure for representing expressions, where unary and binary operators are organized to facilitate more efficient learning. To further accelerate training, we compile characteristic expression blocks from each domain and include them in the operator dictionary, providing relevant building blocks. Experimental results demonstrate that leveraging symbol priors significantly enhances the performance of symbolic regression, resulting in faster convergence and higher accuracy.

SPMay 16, 2023
Spectral Clustering via Orthogonalization-Free Methods

Qiyuan Pang, Haizhao Yang

While orthogonalization exists in current dimensionality reduction methods in spectral clustering on undirected graphs, it does not scale in parallel computing environments. We propose four orthogonalization-free methods for spectral clustering. Our methods optimize one of two objective functions with no spurious local minima. In theory, two methods converge to features isomorphic to the eigenvectors corresponding to the smallest eigenvalues of the symmetric normalized Laplacian. The other two converge to features isomorphic to weighted eigenvectors weighting by the square roots of eigenvalues. We provide numerical evidence on the synthetic graphs from the IEEE HPEC Graph Challenge to demonstrate the effectiveness of the orthogonalization-free methods. Numerical results on the streaming graphs show that the orthogonalization-free methods are competitive in the streaming graph scenario since they can take full advantage of the computed features of previous graphs and converge fast. Our methods are also more scalable in parallel computing environments because orthogonalization is unnecessary. Numerical results are provided to demonstrate the scalability of our methods. Consequently, our methods have advantages over other dimensionality reduction methods when handling spectral clustering for large streaming graphs.

LGMay 15, 2023
Nearly Optimal VC-Dimension and Pseudo-Dimension Bounds for Deep Neural Network Derivatives

Yahong Yang, Haizhao Yang, Yang Xiang

This paper addresses the problem of nearly optimal Vapnik--Chervonenkis dimension (VC-dimension) and pseudo-dimension estimations of the derivative functions of deep neural networks (DNNs). Two important applications of these estimations include: 1) Establishing a nearly tight approximation result of DNNs in the Sobolev space; 2) Characterizing the generalization error of machine learning methods with loss functions involving function derivatives. This theoretical investigation fills the gap of learning error estimations for a wide range of physics-informed machine learning models and applications including generative models, solving partial differential equations, operator learning, network compression, distillation, regularization, etc.

LGMay 15, 2023
Finite Expression Methods for Discovering Physical Laws from Data

Zhongyi Jiang, Chunmei Wang, Haizhao Yang

Nonlinear dynamics is a pervasive phenomenon observed in scientific and engineering disciplines. However, the task of deriving analytical expressions to describe nonlinear dynamics from limited data remains challenging. In this paper, we shall present a novel deep symbolic learning method called the "finite expression method" (FEX) to discover governing equations within a function space containing a finite set of analytic expressions, based on observed dynamic data. The key concept is to employ FEX to generate analytical expressions of the governing equations by learning the derivatives of partial differential equation (PDE) solutions through convolutions. Our numerical results demonstrate that our FEX surpasses other existing methods (such as PDE-Net, SINDy, GP, and SPL) in terms of numerical performance across a range of problems, including time-dependent PDE problems and nonlinear dynamical systems with time-varying coefficients. Moreover, the results highlight FEX's flexibility and expressive power in accurately approximating symbolic governing equations.

MLFeb 22, 2022
From Optimization Dynamics to Generalization Bounds via Łojasiewicz Gradient Inequality

Fusheng Liu, Haizhao Yang, Soufiane Hayou et al.

Optimization and generalization are two essential aspects of statistical machine learning. In this paper, we propose a framework to connect optimization with generalization by analyzing the generalization error based on the optimization trajectory under the gradient flow algorithm. The key ingredient of this framework is the Uniform-LGI, a property that is generally satisfied when training machine learning models. Leveraging the Uniform-LGI, we first derive convergence rates for gradient flow algorithm, then we give generalization bounds for a large class of machine learning models. We further apply our framework to three distinct machine learning models: linear regression, kernel regression, and two-layer neural networks. Through our approach, we obtain generalization estimates that match or extend previous results.

MLJan 1, 2022
Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

Hao Liu, Haizhao Yang, Minshuo Chen et al.

Learning operators between infinitely dimensional spaces is an important learning task arising in wide applications in machine learning, imaging science, mathematical modeling and simulations, etc. This paper studies the nonparametric estimation of Lipschitz operators using deep neural networks. Non-asymptotic upper bounds are derived for the generalization error of the empirical risk minimizer over a properly chosen network class. Under the assumption that the target operator exhibits a low dimensional structure, our error bounds decay as the training sample size increases, with an attractive fast rate depending on the intrinsic dimension in our estimation. Our assumptions cover most scenarios in real applications and our results give rise to fast rates by exploiting low dimensional structures of data in operator estimation. We also investigate the influence of network structures (e.g., network width, depth, and sparsity) on the generalization error of the neural network estimator and propose a general suggestion on the choice of network structures to maximize the learning efficiency quantitatively.

LGNov 15, 2021
Deep Network Approximation in Terms of Intrinsic Parameters

Zuowei Shen, Haizhao Yang, Shijun Zhang

One of the arguments to explain the success of deep learning is the powerful approximation capacity of deep neural networks. Such capacity is generally accompanied by the explosive growth of the number of parameters, which, in turn, leads to high computational costs. It is of great interest to ask whether we can achieve successful deep learning with a small number of learnable parameters adapting to the target function. From an approximation perspective, this paper shows that the number of parameters that need to be learned can be significantly smaller than people typically expect. First, we theoretically design ReLU networks with a few learnable parameters to achieve an attractive approximation. We prove by construction that, for any Lipschitz continuous function $f$ on $[0,1]^d$ with a Lipschitz constant $λ>0$, a ReLU network with $n+2$ intrinsic parameters (those depending on $f$) can approximate $f$ with an exponentially small error $5λ\sqrt{d}\,2^{-n}$. Such a result is generalized to generic continuous functions. Furthermore, we show that the idea of learning a small number of parameters to achieve a good approximation can be numerically observed. We conduct several experiments to verify that training a small part of parameters can also achieve good results for classification problems if other parameters are pre-specified or pre-trained from a related problem.