Jing An

LG
h-index17
11papers
218citations
Novelty48%
AI Score41

11 Papers

LGFeb 17Code
GLM-5: from Vibe Coding to Agentic Engineering

GLM-5 Team, Aohan Zeng, Xin Lv et al. · tsinghua

We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity. To advance model alignment and autonomy, we implement a new asynchronous reinforcement learning infrastructure that drastically improves post-training efficiency by decoupling generation from training. Furthermore, we propose novel asynchronous agent RL algorithms that further improve RL quality, enabling the model to learn from complex, long-horizon interactions more effectively. Through these innovations, GLM-5 achieves state-of-the-art performance on major open benchmarks. Most critically, GLM-5 demonstrates unprecedented capability in real-world coding tasks, surpassing previous baselines in handling end-to-end software engineering challenges. Code, models, and more information are available at https://github.com/zai-org/GLM-5.

NAJan 29, 2018
Fast algorithms for integral formulations of steady-state radiative transfer equation

Yuwei Fan, Jing An, Lexing Ying

We investigate integral formulations and fast algorithms for the steady-state radiative transfer equation with isotropic and anisotropic scattering. When the scattering term is a smooth convolution on the unit sphere, a model reduction step in the angular domain using the Fourier transformation in 2D and the spherical harmonic transformation in 3D significantly reduces the number of degrees of freedoms. The resulting Fourier coefficients or spherical harmonic coefficients satisfy a Fredholm integral equation of the second kind. We study the uniqueness of the equation and proved an a priori estimate. For a homogeneous medium, the integral equation can be solved efficiently using the FFT and iterative methods. For an inhomogeneous medium, the recursive skeletonization factorization method is applied instead. Numerical simulations demonstrate the efficiency of the proposed algorithms in both homogeneous and inhomogeneous cases and for both transport and diffusion regimes.

NAApr 11, 2017
An efficient spectral-Galerkin approximation and error analysis for Maxwell transmission eigenvalue problems in spherical geometries

Jing An, zhimin Zhang

We propose and analyze an efficient spectral-Galerkin approximation for the Maxwell transmission eigenvalue problem in spherical geometry. Using a vector spherical harmonic expansion, we reduce the problem to a sequence of equivalent one-dimensional TE and TM modes that can be solved individually in parallel. For the TE mode, we derive associated generalized eigenvalue problems and corresponding pole conditions. Then we introduce weighted Sobolev spaces based on the pole condition and prove error estimates for the generalized eigenvalue problem. The TM mode is a coupled system with four unknown functions, which is challenging for numerical calculation. To handle it, we design an effective algorithm using Legendre-type vector basis functions. Finally, we provide some numerical experiments to validate our theoretical results and demonstrate the efficiency of the algorithms.

MLMar 6, 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss

Pierre Bréchet, Katerina Papagiannouli, Jing An et al.

We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another type of loss and connects with the generative setting. We characterize the critical points and minimizers of the Bures-Wasserstein distance over the space of rank-bounded matrices. The Hessian of this loss at low-rank matrices can theoretically blow up, which creates challenges to analyze convergence of gradient optimization methods. We establish convergence results for gradient flow using a smooth perturbative version of the loss as well as convergence results for finite step size gradient descent under certain assumptions on the initial weights.

LGApr 18, 2023
Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks

Jing An, Jianfeng Lu

We study the convergence of stochastic gradient descent (SGD) for non-convex objective functions. We establish the local convergence with positive probability under the local Łojasiewicz condition introduced by Chatterjee in \cite{chatterjee2022convergence} and an additional local structural assumption of the loss function landscape. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. We also provide examples of neural networks with finite widths such that our assumptions hold.

NAOct 27, 2016
Spectral-Galerkin Approximation and Optimal Error Estimate for Stokes Eigenvalue Problems in Polar Geometries

Jing An, Huiyuan Li, Zhimin Zhang

In this paper we propose and analyze spectral-Galerkin methods for the Stokes eigenvalue problem based on the stream function formulation in polar geometries. We first analyze the stream function} formulated fourth-order equation under the polar coordinates, then we derive the pole condition and reduce the problem on a circular disk to a sequence of equivalent one-dimensional eigenvalue problems that can be solved in parallel. The novelty of our approach lies in the construction} of suitably weighted Sobolev spaces according to the pole conditions, based on which, the optimal error estimate for approximated eigenvalue of each one dimensional problem can be obtained. Further, we extend our method to the non-separable Stokes eigenvalue problem in an elliptic domain and establish the optimal error bounds. Finally, we provide some numerical experiments to validate our theoretical results and algorithms.

OCJan 28, 2025
Convergence of two-timescale gradient descent ascent dynamics: finite-dimensional and mean-field perspectives

Jing An, Jianfeng Lu

The two-timescale gradient descent-ascent (GDA) is a canonical gradient algorithm designed to find Nash equilibria in min-max games. We analyze the two-timescale GDA by investigating the effects of learning rate ratios on convergence behavior in both finite-dimensional and mean-field settings. In particular, for finite-dimensional quadratic min-max games, we obtain long-time convergence in near quasi-static regimes through the hypocoercivity method. For mean-field GDA dynamics, we investigate convergence under a finite-scale ratio using a mixed synchronous-reflection coupling technique.

LGMay 31, 2021
Combining resampling and reweighting for faithful stochastic optimization

Jing An, Lexing Ying

Many machine learning and data science tasks require solving non-convex optimization problems. When the loss function is a sum of multiple terms, a popular method is the stochastic gradient descent. Viewed as a process for sampling the loss function landscape, the stochastic gradient descent is known to prefer flat minima. Though this is desired for certain optimization problems such as in deep learning, it causes issues when the goal is to find the global minimum, especially if the global minimum resides in a sharp valley. Illustrated with a simple motivating example, we show that the fundamental reason is that the difference in the Lipschitz constants of multiple terms in the loss function causes stochastic gradient descent to experience different variances at different minima. In order to mitigate this effect and perform faithful optimization, we propose a combined resampling-reweighting scheme to balance the variance at local minima and extend to general loss functions. We explain from the numerical stability perspective how the proposed scheme is more likely to select the true global minimum, and the local convergence analysis perspective how it converges to a minimum faster when compared with the vanilla stochastic gradient descent. Experiments from robust statistics and computational chemistry are provided to demonstrate the theoretical findings.

LGMay 11, 2021
Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments

Xiaolong Wei, LiFang Yang, Xianglin Huang et al.

At present, attention mechanism has been widely applied to the fields of deep learning models. Structural models that based on attention mechanism can not only record the relationships between features position, but also can measure the importance of different features based on their weights. By establishing dynamically weighted parameters for choosing relevant and irrelevant features, the key information can be strengthened, and the irrelevant information can be weakened. Therefore, the efficiency of deep learning algorithms can be significantly elevated and improved. Although transformers have been performed very well in many fields including reinforcement learning, there are still many problems and applications can be solved and made with transformers within this area. MARL (known as Multi-Agent Reinforcement Learning) can be recognized as a set of independent agents trying to adapt and learn through their way to reach the goal. In order to emphasize the relationship between each MDP decision in a certain time period, we applied the hierarchical coding method and validated the effectiveness of this method. This paper proposed a hierarchical transformers MADDPG based on RNN which we call it Hierarchical RNNs-Based Transformers MADDPG(HRTMADDPG). It consists of a lower level encoder based on RNNs that encodes multiple step sizes in each time sequence, and it also consists of an upper sequence level encoder based on transformer for learning the correlations between multiple sequences so that we can capture the causal relationship between sub-time sequences and make HRTMADDPG more efficient.

LGSep 28, 2020
Why resampling outperforms reweighting for correcting sampling bias with stochastic gradients

Jing An, Lexing Ying, Yuhua Zhu

A data set sampled from a certain population is biased if the subgroups of the population are sampled at proportions that are significantly different from their underlying proportions. Training machine learning models on biased data sets requires correction techniques to compensate for the bias. We consider two commonly-used techniques, resampling and reweighting, that rebalance the proportions of the subgroups to maintain the desired objective function. Though statistically equivalent, it has been observed that resampling outperforms reweighting when combined with stochastic gradient algorithms. By analyzing illustrative examples, we explain the reason behind this phenomenon using tools from dynamical stability and stochastic asymptotics. We also present experiments from regression, classification, and off-policy prediction to demonstrate that this is a general phenomenon. We argue that it is imperative to consider the objective function design and the optimization algorithm together while addressing the sampling bias.

MLMay 21, 2018
Stochastic modified equations for the asynchronous stochastic gradient descent

Jing An, Jianfeng Lu, Lexing Ying

We propose a stochastic modified equations (SME) for modeling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the relationship between different types of stochastic gradient algorithms. We show the convergence of ASGD to the SME in the continuous time limit, as well as the SME's precise prediction to the trajectories of ASGD with various forcing terms. As an application of the SME, we propose an optimal mini-batching strategy for ASGD via solving the optimal control problem of the associated SME.