Jiang Yang

CV
h-index9
14papers
76citations
Novelty43%
AI Score45

14 Papers

CVApr 11, 2023
Life Regression based Patch Slimming for Vision Transformers

Jiawei Chen, Lin Chen, Jiang Yang et al.

Vision transformers have achieved remarkable success in computer vision tasks by using multi-head self-attention modules to capture long-range dependencies within images. However, the high inference computation cost poses a new challenge. Several methods have been proposed to address this problem, mainly by slimming patches. In the inference stage, these methods classify patches into two classes, one to keep and the other to discard in multiple layers. This approach results in additional computation at every layer where patches are discarded, which hinders inference acceleration. In this study, we tackle the patch slimming problem from a different perspective by proposing a life regression module that determines the lifespan of each image patch in one go. During inference, the patch is discarded once the current layer index exceeds its life. Our proposed method avoids additional computation and parameters in multiple layers to enhance inference speed while maintaining competitive performance. Additionally, our approach requires fewer training epochs than other patch slimming methods.

LGOct 5, 2024
Sinc Kolmogorov-Arnold Network and Its Applications on Physics-informed Neural Networks

Tianchi Yu, Jingwei Qiu, Jiang Yang et al.

In this paper, we propose to use Sinc interpolation in the context of Kolmogorov-Arnold Networks, neural networks with learnable activation functions, which recently gained attention as alternatives to multilayer perceptron. Many different function representations have already been tried, but we show that Sinc interpolation proposes a viable alternative, since it is known in numerical analysis to represent well both smooth functions and functions with singularities. This is important not only for function approximation but also for the solutions of partial differential equations with physics-informed neural networks. Through a series of experiments, we show that SincKANs provide better results in almost all of the examples we have considered.

NAMay 19
A second-order product-type implicit-explicit Runge-Kutta method preserving unit length and energy dissipation structures for gradient flows of vector fields

Jianan Li, Shuang Liu, Tao Tang et al.

Gradient flows of unit vector fields arise in a wide range of physical models such as harmonic map heat flows, nematic liquid crystals, and magnetization dynamics. Designing numerical schemes that simultaneously preserve the unit length constraint and dissipate energy is essential for reliable simulations of such systems. Although projection methods can effectively enforce the unit length constraint, ensuring energy dissipation under projection, especially in high-order schemes, remains challenging. Unlike traditional implicit-explicit Runge-Kutta (IMEX-RK) methods, in this work we propose a general methodology for constructing product-type IMEX-RK schemes that offers greater adaptability to various models with the goal of designing structure-preserving numerical schemes. For gradient flows of unit vector fields with Dirichlet energy, we design a linear and second-order numerical scheme that simultaneously preserves energy dissipation and the unit length constraint by using product-type IMEX-RK methods and projection techniques. Numerical experiments verify the accuracy, stability, and structure-preserving properties of the scheme. According to our best knowledge, this is the first second-order linear scheme that can preserve both the unit length and the original Dirichlet energy for harmonic map heat flows.

NAJul 21, 2024
Computational and analytical studies of a new nonlocal phase-field crystal model in two dimensions

Qiang Du, Kai Wang, Jiang Yang

A nonlocal phase-field crystal (NPFC) model is presented as a nonlocal counterpart of the local phase-field crystal (LPFC) model and a special case of the structural PFC (XPFC) derived from classical field theory for crystal growth and phase transition. The NPFC incorporates a finite range of spatial nonlocal interactions that can account for both repulsive and attractive effects. The specific form is data-driven and determined by a fitting to the materials structure factor, which can be much more accurate than the LPFC and previously proposed fractional variant. In particular, it is able to match the experimental data of the structure factor up to the second peak, an achievement not possible with other PFC variants studied in the literature. Both LPFC and fractional PFC (FPFC) are also shown to be distinct scaling limits of the NPFC, which reflects the generality. The advantage of NPFC in retaining material properties suggests that it may be more suitable for characterizing liquid-solid transition systems. Moreover, we study numerical discretizations using Fourier spectral methods, which are shown to be convergent and asymptotically compatible, making them robust numerical discretizations across different parameter ranges. Numerical experiments are given in the two-dimensional case to demonstrate the effectiveness of the NPFC in simulating crystal structures and grain boundaries.

NAMar 14
Energy Dissipation Preserving Feature-based DNN Galerkin Methods for Gradient Flows

Tao Tang, Jiang Yang, Yuxiang Zhao et al.

In recent years, deep learning methods, exemplified by Physics-Informed Neural Networks (PINNs), have been widely applied to the numerical solution of differential equations. However, these methods may suffer from limited accuracy, high training costs, and lack of robustness, particularly their inability to preserve the intrinsic physical structures of continuous PDE models, such as the energy dissipation property in gradient flow systems. To address these challenges, we propose a feature-based Deep Neural Network Galerkin (DNN-G) framework designed for structure-preserving simulations of gradient flows. Instead of treating neural networks merely as optimization-driven solvers, we employ them as adaptive feature generators that define nonlinear trial spaces within a Galerkin projection formulation.This formulation guarantees semi-discrete energy dissipation and can be naturally combined with energy stable time integration schemes. Several strategies for constructing neural basis functions are investigated, including random features, structured initialization, and problem-informed pre-training. Numerical experiments demonstrate that the proposed method preserves robust energy stability in high-dimensional settings and accurately captures complex topological transitions. With equivalent degrees of freedom, the DNN-G framework achieves higher accuracy than classical spectral methods, highlighting the effectiveness of neural feature representations for the numerical solution of partial differential equations.

LGDec 6, 2024
$ε$-rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics

Jiang Yang, Yuxiang Zhao, Quanhui Zhu

Understanding the training dynamics of deep neural networks (DNNs), particularly how they evolve low-dimensional features from high-dimensional data, remains a central challenge in deep learning theory. In this work, we introduce the concept of $ε$-rank, a novel metric quantifying the effective feature of neuron functions in the terminal hidden layer. Through extensive experiments across diverse tasks, we observe a universal staircase phenomenon: during training process implemented by the standard stochastic gradient descent methods, the decline of the loss function is accompanied by an increase in the $ε$-rank and exhibits a staircase pattern. Theoretically, we rigorously prove a negative correlation between the loss lower bound and $ε$-rank, demonstrating that a high $ε$-rank is essential for significant loss reduction. Moreover, numerical evidences show that within the same deep neural network, the $ε$-rank of the subsequent hidden layer is higher than that of the previous hidden layer. Based on these observations, to eliminate the staircase phenomenon, we propose a novel pre-training strategy on the initial hidden layer that elevates the $ε$-rank of the terminal hidden layer. Numerical experiments validate its effectiveness in reducing training time and improving accuracy across various tasks. Therefore, the newly introduced concept of $ε$-rank is a computable quantity that serves as an intrinsic effective metric characteristic for deep neural networks, providing a novel perspective for understanding the training dynamics of neural networks and offering a theoretical foundation for designing efficient training strategies in practical applications.

AISep 22, 2025
Table2LaTeX-RL: High-Fidelity LaTeX Code Generation from Table Images via Reinforced Multimodal Language Models

Jun Ling, Yao Qi, Tao Huang et al.

In this work, we address the task of table image to LaTeX code generation, with the goal of automating the reconstruction of high-quality, publication-ready tables from visual inputs. A central challenge of this task lies in accurately handling complex tables -- those with large sizes, deeply nested structures, and semantically rich or irregular cell content -- where existing methods often fail. We begin with a comprehensive analysis, identifying key challenges and highlighting the limitations of current evaluation protocols. To overcome these issues, we propose a reinforced multimodal large language model (MLLM) framework, where a pre-trained MLLM is fine-tuned on a large-scale table-to-LaTeX dataset. To further improve generation quality, we introduce a dual-reward reinforcement learning strategy based on Group Relative Policy Optimization (GRPO). Unlike standard approaches that optimize purely over text outputs, our method incorporates both a structure-level reward on LaTeX code and a visual fidelity reward computed from rendered outputs, enabling direct optimization of the visual output quality. We adopt a hybrid evaluation protocol combining TEDS-Structure and CW-SSIM, and show that our method achieves state-of-the-art performance, particularly on structurally complex tables, demonstrating the effectiveness and robustness of our approach.

CVMay 25, 2023
Action Sensitivity Learning for Temporal Action Localization

Jiayi Shao, Xiaohan Wang, Ruijie Quan et al.

Temporal action localization (TAL), which involves recognizing and locating action instances, is a challenging task in video understanding. Most existing approaches directly predict action classes and regress offsets to boundaries, while overlooking the discrepant importance of each frame. In this paper, we propose an Action Sensitivity Learning framework (ASL) to tackle this task, which aims to assess the value of each frame and then leverage the generated action sensitivity to recalibrate the training procedure. We first introduce a lightweight Action Sensitivity Evaluator to learn the action sensitivity at the class level and instance level, respectively. The outputs of the two branches are combined to reweight the gradient of the two sub-tasks. Moreover, based on the action sensitivity of each frame, we design an Action Sensitive Contrastive Loss to enhance features, where the action-aware frames are sampled as positive pairs to push away the action-irrelevant frames. The extensive studies on various action localization benchmarks (i.e., MultiThumos, Charades, Ego4D-Moment Queries v1.0, Epic-Kitchens 100, Thumos14 and ActivityNet1.3) show that ASL surpasses the state-of-the-art in terms of average-mAP under multiple types of scenarios, e.g., single-labeled, densely-labeled and egocentric.

LGJul 6, 2021
Generalization Error Analysis of Neural networks with Gradient Based Regularization

Lingfeng Li, Xue-Cheng Tai, Jiang Yang

We study gradient-based regularization methods for neural networks. We mainly focus on two regularization methods: the total variation and the Tikhonov regularization. Applying these methods is equivalent to using neural networks to solve some partial differential equations, mostly in high dimensions in practical applications. In this work, we introduce a general framework to analyze the generalization error of regularized networks. The error estimate relies on two assumptions on the approximation error and the quadrature error. Moreover, we conduct some experiments on the image classification tasks to show that gradient-based methods can significantly improve the generalization ability and adversarial robustness of neural networks. A graphical extension of the gradient-based methods are also considered in the experiments.

CVJun 18, 2021
Multi-Granularity Network with Modal Attention for Dense Affective Understanding

Baoming Yan, Lin Wang, Ke Gao et al.

Video affective understanding, which aims to predict the evoked expressions by the video content, is desired for video creation and recommendation. In the recent EEV challenge, a dense affective understanding task is proposed and requires frame-level affective prediction. In this paper, we propose a multi-granularity network with modal attention (MGN-MA), which employs multi-granularity features for better description of the target frame. Specifically, the multi-granularity features could be divided into frame-level, clips-level and video-level features, which corresponds to visual-salient content, semantic-context and video theme information. Then the modal attention fusion module is designed to fuse the multi-granularity features and emphasize more affection-relevant modals. Finally, the fused feature is fed into a Mixtures Of Experts (MOE) classifier to predict the expressions. Further employing model-ensemble post-processing, the proposed method achieves the correlation score of 0.02292 in the EEV challenge.

CVJul 15, 2020
Augmented Bi-path Network for Few-shot Learning

Baoming Yan, Chen Zhou, Bo Zhao et al.

Few-shot Learning (FSL) which aims to learn from few labeled training data is becoming a popular research topic, due to the expensive labeling cost in many real-world applications. One kind of successful FSL method learns to compare the testing (query) image and training (support) image by simply concatenating the features of two images and feeding it into the neural network. However, with few labeled data in each class, the neural network has difficulty in learning or comparing the local features of two images. Such simple image-level comparison may cause serious mis-classification. To solve this problem, we propose Augmented Bi-path Network (ABNet) for learning to compare both global and local features on multi-scales. Specifically, the salient patches are extracted and embedded as the local features for every image. Then, the model learns to augment the features for better robustness. Finally, the model learns to compare global and local features separately, i.e., in two paths, before merging the similarities. Extensive experiments show that the proposed ABNet outperforms the state-of-the-art methods. Both quantitative and visual ablation studies are provided to verify that the proposed modules lead to more precise comparison results.

CVMar 21, 2020
A level set representation method for N-dimensional convex shape and applications

Lingfeng li, Shousheng Luo, Xue-Cheng Tai et al.

In this work, we present a new efficient method for convex shape representation, which is regardless of the dimension of the concerned objects, using level-set approaches. Convexity prior is very useful for object completion in computer vision. It is a very challenging task to design an efficient method for high dimensional convex objects representation. In this paper, we prove that the convexity of the considered object is equivalent to the convexity of the associated signed distance function. Then, the second order condition of convex functions is used to characterize the shape convexity equivalently. We apply this new method to two applications: object segmentation with convexity prior and convex hull problem (especially with outliers). For both applications, the involved problems can be written as a general optimization problem with three constraints. Efficient algorithm based on alternating direction method of multipliers is presented for the optimization problem. Numerical experiments are conducted to verify the effectiveness and efficiency of the proposed representation method and algorithm.

CVAug 9, 2019
Convex hull algorithms based on some variational models

Lingfeng Li, Shousheng Luo, Xue-Cheng Tai et al.

Seeking the convex hull of an object is a very fundamental problem arising from various tasks. In this work, we propose two variational convex hull models using level set representation for 2-dimensional data. The first one is an exact model, which can get the convex hull of one or multiple objects. In this model, the convex hull is characterized by the zero sublevel-set of a convex level set function, which is non-positive at every given point. By minimizing the area of the zero sublevel-set, we can find the desired convex hull. The second one is intended to get convex hull of objects with outliers. Instead of requiring all the given points are included, this model penalizes the distance from each given point to the zero sublevel-set. Literature methods are not able to handle outliers. For the solution of these models, we develop efficient numerical schemes using alternating direction method of multipliers. Numerical examples are given to demonstrate the advantages of the proposed methods.

NAOct 3, 2017
A new class of efficient and robust energy stable schemes for gradient flows

Jie Shen, Jie Xu, Jiang Yang

We propose a new numerical technique to deal with nonlinear terms in gradient flows. By introducing a scalar auxiliary variable (SAV), we construct efficient and robust energy stable schemes for a large class of gradient flows. The SAV approach is not restricted to specific forms of the nonlinear part of the free energy, and only requires to solve {\it decoupled} linear equations with {\it constant coefficients}. We use this technique to deal with several challenging applications which can not be easily handled by existing approaches, and present convincing numerical results to show that our schemes are not only much more efficient and easy to implement, but can also better capture the physical properties in these models. Based on this SAV approach, we can construct unconditionally second-order energy stable schemes; and we can easily construct even third or fourth order BDF schemes, although not unconditionally stable, which are very robust in practice. In particular, when coupled with an adaptive time stepping strategy, the SAV approach can be extremely efficient and accurate.