Shing-Tung Yau

LG
h-index115
16papers
429citations
Novelty44%
AI Score49

16 Papers

69.0OCMar 30
Yau's Affine Normal Descent: Algorithmic Framework and Convergence Analysis

Yi-Shuai Niu, Artan Sheshmani, Shing-Tung Yau

We propose Yau's Affine Normal Descent (YAND), a geometric framework for smooth unconstrained optimization in which search directions are defined by the equi-affine normal of level-set hypersurfaces. The resulting directions are invariant under volume-preserving affine transformations and intrinsically adapt to anisotropic curvature. Using the analytic representation of the affine normal from affine differential geometry, we establish its equivalence with the classical slice-centroid construction under convexity. For strictly convex quadratic objectives, affine-normal directions are collinear with Newton directions, implying one-step convergence under exact line search. For general smooth (possibly nonconvex) objectives, we characterize precisely when affine-normal directions yield strict descent and develop a line-search-based YAND. We establish global convergence under standard smoothness assumptions, linear convergence under strong convexity and Polyak-Lojasiewicz conditions, and quadratic local convergence near nondegenerate minimizers. We further show that affine-normal directions are robust under affine scalings, remaining insensitive to arbitrarily ill-conditioned transformations. Numerical experiments illustrate the geometric behavior of the method and its robustness under strong anisotropic scaling.

51.1PMApr 28
Yau's Affine-Normal Descent for Large-Scale Unrestricted Higher-Moment Portfolio Optimization

Ya-Juan Wang, Yi-Shuai Niu, Artan Sheshmani et al.

Unrestricted mean-variance-skewness-kurtosis portfolio optimization can capture asymmetry and tail risk, but sample-moment formulations become computationally impractical when the asset universe is large: they produce dense nonconvex quartic objectives with prohibitive coskewness and cokurtosis tensors and anisotropic, ill-conditioned level sets. We develop a structure-exploiting algorithm based on Yau's affine-normal descent that follows affine-normal directions of the current level set while working directly with the return matrix. The method avoids explicit higher-order tensors and exploits the quartic structure for exact sample oracles, derivative evaluation, and exact line search. We also provide theory for the reduced simplex formulation, including regularity and convexity conditions that separate data-map geometry from investor preference coefficients. Computational results show a clear implementation split: a direct configuration is effective on the standard small benchmark, whereas a preconditioned conjugate-gradient configuration with stall recovery becomes the preferred large-scale implementation by the upper end of the hundreds and remains competitive as the asset universe moves into the thousands. On a 5-minute A-share panel with 5,440 stocks, the method makes direct full-universe comparisons with exact mean-variance portfolios feasible and shows on the baseline split that the incremental value of higher moments is strongest at moderate return targets.

60.1MSApr 8
Polylab: A MATLAB Toolbox for Multivariate Polynomial Modeling

Yi-Shuai Niu, Shing-Tung Yau

Polylab is a MATLAB toolbox for multivariate polynomial scalars and polynomial matrices with a unified symbolic-numeric interface across CPU and GPU-oriented backends. The software exposes three aligned classes: MPOLY for CPU execution, MPOLY_GPU as a legacy GPU baseline, and MPOLY_HP as an improved GPU-oriented implementation. Across these backends, Polylab supports polynomial construction, algebraic manipulation, simplification, matrix operations, differentiation, Jacobian and Hessian construction, LaTeX export, CPU-side LaTeX reconstruction, backend conversion, and interoperability with YALMIP and SOSTOOLS. Versions 3.0 and 3.1 add two practically important extensions: explicit variable metadata through vars.id and vars.name, which makes mixed-variable expressions safe even when objects are created independently, and affine-normal direction computation via automatic differentiation, MF-logDet-Exact, and MF-logDet-Stochastic. The toolbox has already been used successfully in prior research applications, and Polylab Version 3.1 adds a new geometry-oriented computational layer on top of a mature polynomial modeling core. This paper documents the architecture and user-facing interface of the software, organizes its functionality by workflow, presents representative MATLAB sessions with actual outputs, and reports reproducible benchmarks. The results show that MPOLY is the right default for lightweight interactive workloads, whereas MPOLY-HP becomes advantageous for reduction-heavy simplification and medium-to-large affine-normal computation; the stochastic log-determinant variant becomes attractive in larger sparse regimes under approximation-oriented parameter choices.

96.5OCApr 1
Affine Normal Directions via Log-Determinant Geometry: Scalable Computation under Sparse Polynomial Structure

Yi-Shuai Niu, Artan Sheshmani, Shing-Tung Yau

Affine normal directions provide intrinsic affine-invariant descent directions derived from the geometry of level sets. Their practical use, however, has long been hindered by the need to evaluate third-order derivatives and invert tangent Hessians, which becomes computationally prohibitive in high dimensions. In this paper, we show that affine normal computation admits an exact reduction to second-order structure: the classical third-order contraction term is precisely the gradient of the log-determinant of the tangent Hessian. This identity replaces explicit third-order tensor contraction by a matrix-free formulation based on tangent linear solves, Hessian-vector products, and log-determinant gradient evaluation. Building on this reduction, we develop exact and stochastic matrix-free procedures for affine normal evaluation. For sparse polynomial objectives, the algebraic closure of derivatives further yields efficient sparse kernels for gradients, Hessian-vector products, and directional third-order contractions, leading to scalable implementations whose cost is governed by the sparsity structure of the polynomial representation. We establish end-to-end complexity bounds showing near-linear scaling with respect to the relevant sparsity scale under fixed stochastic and Krylov budgets. Numerical experiments confirm that the proposed MF-LogDet formulation reproduces the original autodifferentiation-based affine normal direction to near machine precision, delivers substantial runtime improvements in moderate and high dimensions, and exhibits empirical near-linear scaling in both dimension and sparsity. These results provide a practical computational route for affine normal evaluation and reveal a new connection between affine differential geometry, log-determinant curvature, and large-scale structured optimization.

SRMar 11, 2025
A Neural Symbolic Model for Space Physics

Jie Ying, Haowei Lin, Chao Yue et al.

In this study, we unveil a new AI model, termed PhyE2E, to discover physical formulas through symbolic regression. PhyE2E simplifies symbolic regression by decomposing it into sub-problems using the second-order derivatives of an oracle neural network, and employs a transformer model to translate data into symbolic formulas in an end-to-end manner. The resulting formulas are refined through Monte-Carlo Tree Search and Genetic Programming. We leverage a large language model to synthesize extensive symbolic expressions resembling real physics, and train the model to recover these formulas directly from data. A comprehensive evaluation reveals that PhyE2E outperforms existing state-of-the-art approaches, delivering superior symbolic accuracy, precision in data fitting, and consistency in physical units. We deployed PhyE2E to five applications in space physics, including the prediction of sunspot numbers, solar rotational angular velocity, emission line contribution functions, near-Earth plasma pressure, and lunar-tide plasma signals. The physical formulas generated by AI demonstrate a high degree of accuracy in fitting the experimental data from satellites and astronomical telescopes. We have successfully upgraded the formula proposed by NASA in 1993 regarding solar activity, and for the first time, provided the explanations for the long cycle of solar activity in an explicit form. We also found that the decay of near-Earth plasma pressure is proportional to r^2 to Earth, where subsequent mathematical derivations are consistent with satellite data from another independent study. Moreover, we found physical formulas that can describe the relationships between emission lines in the extreme ultraviolet spectrum of the Sun, temperatures, electron densities, and magnetic fields. The formula obtained is consistent with the properties that physicists had previously hypothesized it should possess.

LGNov 13, 2024
Estimating unknown parameters in differential equations with a reinforcement learning based PSO method

Wenkui Sun, Xiaoya Fan, Lijuan Jia et al.

Differential equations offer a foundational yet powerful framework for modeling interactions within complex dynamic systems and are widely applied across numerous scientific fields. One common challenge in this area is estimating the unknown parameters of these dynamic relationships. However, traditional numerical optimization methods rely on the selection of initial parameter values, making them prone to local optima. Meanwhile, deep learning and Bayesian methods require training models on specific differential equations, resulting in poor versatility. This paper reformulates the parameter estimation problem of differential equations as an optimization problem by introducing the concept of particles from the particle swarm optimization algorithm. Building on reinforcement learning-based particle swarm optimization (RLLPSO), this paper proposes a novel method, DERLPSO, for estimating unknown parameters of differential equations. We compared its performance on three typical ordinary differential equations with the state-of-the-art methods, including the RLLPSO algorithm, traditional numerical methods, deep learning approaches, and Bayesian methods. The experimental results demonstrate that our DERLPSO consistently outperforms other methods in terms of performance, achieving an average Mean Square Error of 1.13e-05, which reduces the error by approximately 4 orders of magnitude compared to other methods. Apart from ordinary differential equations, our DERLPSO also show great promise for estimating unknown parameters of partial differential equations. The DERLPSO method proposed in this paper has high accuracy, is independent of initial parameter values, and possesses strong versatility and stability. This work provides new insights into unknown parameter estimation for differential equations.

QMNov 3, 2021
PhyloTransformer: A Discriminative Model for Mutation Prediction Based on a Multi-head Self-attention Mechanism

Yingying Wu, Shusheng Xu, Shing-Tung Yau et al.

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused an ongoing pandemic infecting 219 million people as of 10/19/21, with a 3.6% mortality rate. Natural selection can generate favorable mutations with improved fitness advantages; however, the identified coronaviruses may be the tip of the iceberg, and potentially more fatal variants of concern (VOCs) may emerge over time. Understanding the patterns of emerging VOCs and forecasting mutations that may lead to gain of function or immune escape is urgently required. Here we developed PhyloTransformer, a Transformer-based discriminative model that engages a multi-head self-attention mechanism to model genetic mutations that may lead to viral reproductive advantage. In order to identify complex dependencies between the elements of each input sequence, PhyloTransformer utilizes advanced modeling techniques, including a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+) from Performer, and the Masked Language Model (MLM) from Bidirectional Encoder Representations from Transformers (BERT). PhyloTransformer was trained with 1,765,297 genetic sequences retrieved from the Global Initiative for Sharing All Influenza Data (GISAID) database. Firstly, we compared the prediction accuracy of novel mutations and novel combinations using extensive baseline models; we found that PhyloTransformer outperformed every baseline method with statistical significance. Secondly, we examined predictions of mutations in each nucleotide of the receptor binding motif (RBM), and we found our predictions were precise and accurate. Thirdly, we predicted modifications of N-glycosylation sites to identify mutations associated with altered glycosylation that may be favored during viral evolution. We anticipate that PhyloTransformer may guide proactive vaccine design for effective targeting of future SARS-CoV-2 variants.

COJun 30, 2020
Graph Laplacians, Riemannian Manifolds and their Machine-Learning

Yang-Hui He, Shing-Tung Yau

Graph Laplacians as well as related spectral inequalities and (co-)homology provide a foray into discrete analogues of Riemannian manifolds, providing a rich interplay between combinatorics, geometry and theoretical physics. We apply some of the latest techniques in data science such as supervised and unsupervised machine-learning and topological data analysis to the Wolfram database of some 8000 finite graphs in light of studying these correspondences. Encouragingly, we find that neural classifiers, regressors and networks can perform, with high efficiently and accuracy, a multitude of tasks ranging from recognizing graph Ricci-flatness, to predicting the spectral gap, to detecting the presence of Hamiltonian cycles, etc.

CVJan 11, 2020
AE-OT-GAN: Training GANs from data specific latent distribution

Dongsheng An, Yang Guo, Min Zhang et al.

Though generative adversarial networks (GANs) areprominent models to generate realistic and crisp images,they often encounter the mode collapse problems and arehard to train, which comes from approximating the intrinsicdiscontinuous distribution transform map with continuousDNNs. The recently proposed AE-OT model addresses thisproblem by explicitly computing the discontinuous distribu-tion transform map through solving a semi-discrete optimaltransport (OT) map in the latent space of the autoencoder.However the generated images are blurry. In this paper, wepropose the AE-OT-GAN model to utilize the advantages ofthe both models: generate high quality images and at thesame time overcome the mode collapse/mixture problems.Specifically, we first faithfully embed the low dimensionalimage manifold into the latent space by training an autoen-coder (AE). Then we compute the optimal transport (OT)map that pushes forward the uniform distribution to the la-tent distribution supported on the latent manifold. Finally,our GAN model is trained to generate high quality imagesfrom the latent distribution, the distribution transform mapfrom which to the empirical data distribution will be con-tinuous. The paired data between the latent code and thereal images gives us further constriction about the generator.Experiments on simple MNIST dataset and complex datasetslike Cifar-10 and CelebA show the efficacy and efficiency ofour proposed method.

LGFeb 8, 2019
Mode Collapse and Regularity of Optimal Transportation Maps

Na Lei, Yang Guo, Dongsheng An et al.

This work builds the connection between the regularity theory of optimal transportation map, Monge-Ampère equation and GANs, which gives a theoretic understanding of the major drawbacks of GANs: convergence difficulty and mode collapse. According to the regularity theory of Monge-Ampère equation, if the support of the target measure is disconnected or just non-convex, the optimal transportation mapping is discontinuous. General DNNs can only approximate continuous mappings. This intrinsic conflict leads to the convergence difficulty and mode collapse in GANs. We test our hypothesis that the supports of real data distribution are in general non-convex, therefore the discontinuity is unavoidable using an Autoencoder combined with discrete optimal transportation map (AE-OT framework) on the CelebA data set. The testing result is positive. Furthermore, we propose to approximate the continuous Brenier potential directly based on discrete Brenier theory to tackle mode collapse. Comparing with existing method, this method is more accurate and effective.

LGSep 16, 2018
Latent Space Optimal Transport for Generative Models

Huidong Liu, Yang Guo, Na Lei et al.

Variational Auto-Encoders enforce their learned intermediate latent-space data distribution to be a simple distribution, such as an isotropic Gaussian. However, this causes the posterior collapse problem and loses manifold structure which can be important for datasets such as facial images. A GAN can transform a simple distribution to a latent-space data distribution and thus preserve the manifold structure, but optimizing a GAN involves solving a Min-Max optimization problem, which is difficult and not well understood so far. Therefore, we propose a GAN-like method to transform a simple distribution to a data distribution in the latent space by solving only a minimization problem. This minimization problem comes from training a discriminator between a simple distribution and a latent-space data distribution. Then, we can explicitly formulate an Optimal Transport (OT) problem that computes the desired mapping between the two distributions. This means that we can transform a distribution without solving the difficult Min-Max optimization problem. Experimental results on an eight-Gaussian dataset show that the proposed OT can handle multi-cluster distributions. Results on the MNIST and the CelebA datasets validate the effectiveness of the proposed method.

LGMay 26, 2018
Geometric Understanding of Deep Learning

Na Lei, Zhongxuan Luo, Shing-Tung Yau et al.

Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. It has outperformed conventional methods in various fields and achieved great successes. Unfortunately, the understanding on how it works remains unclear. It has the central importance to lay down the theoretic foundation for deep learning. In this work, we give a geometric view to understand deep learning: we show that the fundamental principle attributing to the success is the manifold structure in data, namely natural high dimensional data concentrates close to a low-dimensional manifold, deep learning learns the manifold and the probability distribution on it. We further introduce the concepts of rectified linear complexity for deep neural network measuring its learning capability, rectified linear complexity of an embedding manifold describing the difficulty to be learned. Then we show for any deep neural network with fixed architecture, there exists a manifold that cannot be learned by the network. Finally, we propose to apply optimal mass transportation theory to control the probability distribution in the latent space.

LGOct 16, 2017
A Geometric View of Optimal Transportation and Generative Model

Na Lei, Kehua Su, Li Cui et al.

In this work, we show the intrinsic relations between optimal transportation and convex geometry, especially the variational approach to solve Alexandrov problem: constructing a convex polytope with prescribed face normals and volumes. This leads to a geometric interpretation to generative models, and leads to a novel framework for generative models. By using the optimal transportation view of GAN model, we show that the discriminator computes the Kantorovich potential, the generator calculates the transportation map. For a large class of transportation costs, the Kantorovich potential can give the optimal transportation map by a close-form formula. Therefore, it is sufficient to solely optimize the discriminator. This shows the adversarial competition can be avoided, and the computational architecture can be simplified. Preliminary experimental results show the geometric method outperforms WGAN for approximating probability measures with multiple clusters in low dimensional space.

NAOct 23, 2017
iSIRA: Integrated Shift-Invert Residual Arnoldi Method for Graph Laplacian Matrices from Big Data

Wei-Qiang Huang, Wen-Wei Lin, Henry Horng-Shing Lu et al.

The eigenvalue problem of a graph Laplacian matrix $L$ arising from a simple, connected and undirected graph has been given more attention due to its extensive applications, such as spectral clustering, community detection, complex network, image processing and so on. The associated graph Laplacian matrix is symmetric, positive semi-definite, and is usually large and sparse. Computing some smallest positive eigenvalues and corresponding eigenvectors is often of interest. However, the singularity of $L$ makes the classical eigensolvers inefficient since we need to factorize $L$ for the purpose of solving large and sparse linear systems exactly. The next difficulty is that it is usually time consuming or even unavailable to factorize a large and sparse matrix arising from real network problems from big data such as social media transactional databases, and sensor systems because there is in general not only local connections. In this paper, we propose an eignsolver based on the inexact residual Arnoldi method together with an implicit remedy of the singularity and an effective deflation for convergent eigenvalues. Numerical experiments reveal that the integrated eigensolver outperforms the classical Arnoldi/Lanczos method for computing some smallest positive eigeninformation provided the LU factorization is not available.

NADec 1, 2014
Time-dependent Hermite-Galerkin spectral method and its applications

Xue Luo, Shing-Tung Yau, Stephen S. -T. Yau

A time-dependent Hermite-Galerkin spectral method (THGSM) is investigated in this paper for the nonlinear convection-diffusion equations in the unbounded domains. The time-dependent scaling factor and translating factor are introduced in the definition of the generalized Hermite functions (GHF). As a consequence, the THGSM based on these GHF has many advantages, not only in theorethical proofs, but also in numerical implementations. The stability and spectral convergence of our proposed method have been established in this paper. The Korteweg-de Vries-Burgers (KdVB) equation and its special cases, including the heat equation and the Burgers' equation, as the examples, have been numerically solved by our method. The numerical results are presented, and it surpasses the existing methods in accuracy. Our theoretical proof of the spectral convergence has been supported by the numerical results.

NAOct 24, 2013
A Novel Skew-Hamiltonian Isotropic Lanczos Algorithm for Spectral Conformal Parameterizations

Wei-Qiang Huang, Xianfeng David Gu, Wen-Wei Lin et al.

Numerous methods for computing conformal mesh paramterizations has been developed due to the vast applications in the field of geometry processing. Spectral conformal parameterization (SCP) is one of these methods to computing a quality conformal parameterization based on the spectral technique. SCP focus on a generalized eigenvalue problem (GEP) $L_{C}\mathbf{f} = λB\mathbf{f}$ whose eigenvector(s) associated with the smallest positive eigenvalue(s) will provide the parameterization result. This paper devotes to study a novel eigensolver for this GEP. Based on the structures of matrix pair $(L_{C},B)$, we show that this GEP can be transformed into a small-scaled compressed deating standard eigenvalue problem with a symmetric positive definite skew-Hamiltonian operator. We then propose a skew-Hamiltonian isotropic Lanczos algorithm (SHILA) to solve the reducing problem. Numerical experiments show that our compressed deating skill remove the inuence of the kernel of $L_{C}$ and transform the original problem to a more robust system. The novel SHILA method can effective avoid the disturbance of duplicate eigenvalues. As a result, our numerical eigensolver can accurately and efficiently compute the conformal parameterization based on the spectral model of SCP.