Zenan Ling

LG
h-index7
18papers
211citations
Novelty51%
AI Score52

18 Papers

LGMar 26, 2022
A Roadmap for Big Model

Sha Yuan, Hanyu Zhao, Shuai Zhao et al. · bytedance, pku

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

LGMay 27, 2022
Global Convergence of Over-parameterized Deep Equilibrium Models

Zenan Ling, Xingyu Xie, Qiuhao Wang et al.

A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.

LGOct 16, 2023Code
Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

Tianjun Ke, Haoqun Cao, Zenan Ling et al.

Meta-learning has demonstrated promising results in few-shot classification (FSC) by learning to solve new problems using prior knowledge. Bayesian methods are effective at characterizing uncertainty in FSC, which is crucial in high-risk fields. In this context, the logistic-softmax likelihood is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classification due to its conditional conjugacy property. However, the theoretical property of logistic-softmax is not clear and previous research indicated that the inherent uncertainty of logistic-softmax leads to suboptimal performance. To mitigate these issues, we revisit and redesign the logistic-softmax likelihood, which enables control of the \textit{a priori} confidence level through a temperature parameter. Furthermore, we theoretically and empirically show that softmax can be viewed as a special case of logistic-softmax and logistic-softmax induces a larger family of data distribution than softmax. Utilizing modified logistic-softmax, we integrate the data augmentation technique into the deep kernel based Gaussian process meta-learning framework, and derive an analytical mean-field approximation for task-specific updates. Our approach yields well-calibrated uncertainty estimates and achieves comparable or superior results on standard benchmark datasets. Code is publicly available at \url{https://github.com/keanson/revisit-logistic-softmax}.

LGAug 31, 2023
On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint

Zenan Ling, Zhenyu Liao, Robert C. Qiu

Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.

LGOct 11, 2024Code
IGNN-Solver: A Graph Neural Solver for Implicit Graph Neural Networks

Junchao Lin, Zenan Ling, Zhanbo Feng et al.

Implicit graph neural networks (IGNNs), which exhibit strong expressive power with a single layer, have recently demonstrated remarkable performance in capturing long-range dependencies (LRD) in underlying graphs while effectively mitigating the over-smoothing problem. However, IGNNs rely on computationally expensive fixed-point iterations, which lead to significant speed and scalability limitations, hindering their application to large-scale graphs. To achieve fast fixed-point solving for IGNNs, we propose a novel graph neural solver, IGNN-Solver, which leverages the generalized Anderson Acceleration method, parameterized by a tiny GNN, and learns iterative updates as a graph-dependent temporal process. To improve effectiveness on large-scale graph tasks, we further integrate sparsification and storage compression methods, specifically tailored for the IGNN-Solver, into its design. Extensive experiments demonstrate that the IGNN-Solver significantly accelerates inference on both small- and large-scale tasks, achieving a $1.5\times$ to $8\times$ speedup without sacrificing accuracy. This advantage becomes more pronounced as the graph scale grows, facilitating its large-scale deployment in real-world applications. The code to reproduce our results is available at https://github.com/landrarwolf/IGNN-Solver.

LGFeb 1Code
Diving into Kronecker Adapters: Component Design Matters

Jiayu Bai, Danchen Yu, Zhenyu Liao et al.

Kronecker adapters have emerged as a promising approach for fine-tuning large-scale models, enabling high-rank updates through tunable component structures. However, existing work largely treats the component structure as a fixed or heuristic design choice, leaving the dimensions and number of Kronecker components underexplored. In this paper, we identify component structure as a key factor governing the capacity of Kronecker adapters. We perform a fine-grained analysis of both the dimensions and number of Kronecker components. In particular, we show that the alignment between Kronecker adapters and full fine-tuning depends on component configurations. Guided by these insights, we propose Component Designed Kronecker Adapters (CDKA). We further provide parameter-budget-aware configuration guidelines and a tailored training stabilization strategy for practical deployment. Experiments across various natural language processing tasks demonstrate the effectiveness of CDKA. Code is available at https://github.com/rainstonee/CDKA.

LGOct 20, 2025Code
Adaptive Discretization for Consistency Models

Jiayu Bai, Zhanbo Feng, Zhijie Deng et al.

Consistency Models (CMs) have shown promise for efficient one-step generation. However, most existing CMs rely on manually designed discretization schemes, which can cause repeated adjustments for different noise schedules and datasets. To address this, we propose a unified framework for the automatic and adaptive discretization of CMs, formulating it as an optimization problem with respect to the discretization step. Concretely, during the consistency training process, we propose using local consistency as the optimization objective to ensure trainability by avoiding excessive discretization, and taking global consistency as a constraint to ensure stability by controlling the denoising error in the training target. We establish the trade-off between local and global consistency with a Lagrange multiplier. Building on this framework, we achieve adaptive discretization for CMs using the Gauss-Newton method. We refer to our approach as ADCMs. Experiments demonstrate that ADCMs significantly improve the training efficiency of CMs, achieving superior generative performance with minimal training overhead on both CIFAR-10 and ImageNet. Moreover, ADCMs exhibit strong adaptability to more advanced DM variants. Code is available at https://github.com/rainstonee/ADCM.

CVAug 30, 2023
Textual and Visual Prompt Fusion for Image Editing via Step-Wise Alignment

Zhanbo Feng, Zenan Ling, Xinyu Lu et al.

The use of denoising diffusion models is becoming increasingly popular in the field of image editing. However, current approaches often rely on either image-guided methods, which provide a visual reference but lack control over semantic consistency, or text-guided methods, which ensure alignment with the text guidance but compromise visual quality. To resolve this issue, we propose a framework that integrates a fusion of generated visual references and text guidance into the semantic latent space of a \textit{frozen} pre-trained diffusion model. Using only a tiny neural network, our framework provides control over diverse content and attributes, driven intuitively by the text prompt. Compared to state-of-the-art methods, the framework generates images of higher quality while providing realistic editing effects across various benchmark datasets.

LGFeb 3
Consistency Deep Equilibrium Models

Junchao Lin, Zenan Ling, Jingwen Xu et al.

Deep Equilibrium Models (DEQs) have emerged as a powerful paradigm in deep learning, offering the ability to model infinite-depth networks with constant memory usage. However, DEQs incur significant inference latency due to the iterative nature of fixed-point solvers. In this work, we introduce the Consistency Deep Equilibrium Model (C-DEQ), a novel framework that leverages consistency distillation to accelerate DEQ inference. We cast the DEQ iterative inference process as evolution along a fixed ODE trajectory toward the equilibrium. Along this trajectory, we train C-DEQs to consistently map intermediate states directly to the fixed point, enabling few-step inference while preserving the performance of the teacher DEQ. At the same time, it facilitates multi-step evaluation to flexibly trade computation for performance gains. Extensive experiments across various domain tasks demonstrate that C-DEQs achieves consistent 2-20$\times$ accuracy improvements over implicit DEQs under the same few-step inference budget.

LGDec 14, 2023
Mitigating Label Bias in Machine Learning: Fairness through Confident Learning

Yixuan Zhang, Boyu Li, Zenan Ling et al.

Discrimination can occur when the underlying unbiased labels are overwritten by an agent with potential bias, resulting in biased datasets that unfairly harm specific groups and cause classifiers to inherit these biases. In this paper, we demonstrate that despite only having access to the biased labels, it is possible to eliminate bias by filtering the fairest instances within the framework of confident learning. In the context of confident learning, low self-confidence usually indicates potential label errors; however, this is not always the case. Instances, particularly those from underrepresented groups, might exhibit low confidence scores for reasons other than labeling errors. To address this limitation, our approach employs truncation of the confidence score and extends the confidence interval of the probabilistic threshold. Additionally, we incorporate with co-teaching paradigm for providing a more robust and reliable selection of fair instances and effectively mitigating the adverse effects of biased labels. Through extensive experimentation and evaluation of various datasets, we demonstrate the efficacy of our approach in promoting fairness and reducing the impact of label bias in machine learning models.

LGFeb 5, 2024
Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures

Zenan Ling, Longbo Li, Zhanbo Feng et al.

Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of the conjugate kernel (CK) and neural tangent kernel (NTK) matrices for implicit DEQs, when the input data are drawn from a high-dimensional Gaussian mixture. We prove, in this setting, that the spectral behavior of these Implicit-CKs and NTKs depend on the DEQ activation function and initial weight variances, but only via a system of four nonlinear equations. As a direct consequence of this theoretical result, we demonstrate that a shallow explicit network can be carefully designed to produce the same CK or NTK as a given DEQ. Despite derived here for Gaussian mixture data, empirical results show the proposed theory and design principle also apply to popular real-world datasets.

LGNov 7, 2024
Series-to-Series Diffusion Bridge Model

Hao Yang, Zhanbo Feng, Feng Zhou et al.

Diffusion models have risen to prominence in time series forecasting, showcasing their robust capability to model complex data distributions. However, their effectiveness in deterministic predictions is often constrained by instability arising from their inherent stochasticity. In this paper, we revisit time series diffusion models and present a comprehensive framework that encompasses most existing diffusion-based methods. Building on this theoretical foundation, we propose a novel diffusion-based time series forecasting model, the Series-to-Series Diffusion Bridge Model ($\mathrm{S^2DBM}$), which leverages the Brownian Bridge process to reduce randomness in reverse estimations and improves accuracy by incorporating informative priors and conditions derived from historical time series data. Experimental results demonstrate that $\mathrm{S^2DBM}$ delivers superior performance in point-to-point forecasting and competes effectively with other diffusion-based models in probabilistic forecasting.

MLJun 23, 2025
A Random Matrix Analysis of In-context Memorization for Nonlinear Attention

Zhenyu Liao, Jiaqing Liu, TianQi Hou et al.

Attention mechanisms have revolutionized machine learning (ML) by enabling efficient modeling of global dependencies across inputs. Their inherently parallelizable structures allow for efficient scaling with the exponentially increasing size of both pretrained data and model parameters. Yet, despite their central role as the computational backbone of modern large language models (LLMs), the theoretical understanding of Attentions, especially in the nonlinear setting, remains limited. In this paper, we provide a precise characterization of the \emph{in-context memorization error} of \emph{nonlinear Attention}, in the high-dimensional proportional regime where the number of input tokens $n$ and their embedding dimension $p$ are both large and comparable. Leveraging recent advances in the theory of large kernel random matrices, we show that nonlinear Attention typically incurs higher memorization error than linear ridge regression on random inputs. However, this gap vanishes, and can even be reversed, when the input exhibits statistical structure, particularly when the Attention weights align with the input signal direction. Our results reveal how nonlinearity and input structure interact with each other to govern the memorization performance of nonlinear Attention. The theoretical insights are supported by numerical experiments.

LGMay 27, 2021
Optimization Induced Equilibrium Networks

Xingyu Xie, Qiuhao Wang, Zenan Ling et al.

Implicit equilibrium models, i.e., deep neural networks (DNNs) defined by implicit equations, have been becoming more and more attractive recently. In this paper, we investigate an emerging question: can an implicit equilibrium model's equilibrium point be regarded as the solution of an optimization problem? To this end, we first decompose DNNs into a new class of unit layer that is the proximal operator of an implicit convex function while keeping its output unchanged. Then, the equilibrium model of the unit layer can be derived, named Optimization Induced Equilibrium Networks (OptEq), which can be easily extended to deep layers. The equilibrium point of OptEq can be theoretically connected to the solution of its corresponding convex optimization problem with explicit objectives. Based on this, we can flexibly introduce prior properties to the equilibrium points: 1) modifying the underlying convex problems explicitly so as to change the architectures of OptEq; and 2) merging the information into the fixed point iteration, which guarantees to choose the desired equilibrium point when the fixed point set is non-singleton. We show that deep OptEq outperforms previous implicit models even with fewer parameters. This work establishes the first step towards the optimization-guided design of deep models.

CVJan 8, 2019
Explaining AlphaGo: Interpreting Contextual Effects in Neural Networks

Zenan Ling, Haotian Ma, Yu Yang et al.

In this paper, we propose to disentangle and interpret contextual effects that are encoded in a pre-trained deep neural network. We use our method to explain the gaming strategy of the alphaGo Zero model. Unlike previous studies that visualized image appearances corresponding to the network output or a neural activation only from a global perspective, our research aims to clarify how a certain input unit (dimension) collaborates with other units (dimensions) to constitute inference patterns of the neural network and thus contribute to the network output. The analysis of local contextual effects w.r.t. certain input units is of special values in real applications. Explaining the logic of the alphaGo Zero model is a typical application. In experiments, our method successfully disentangled the rationale of each move during the Go game.

LGJul 31, 2018
Spectrum concentration in deep residual learning: a free probability approach

Zenan Ling, Xing He, Robert C. Qiu

We revisit the initialization of deep residual networks (ResNets) by introducing a novel analytical tool in free probability to the community of deep learning. This tool deals with non-Hermitian random matrices, rather than their conventional Hermitian counterparts in the literature. As a consequence, this new tool enables us to evaluate the singular value spectrum of the input-output Jacobian of a fully-connected deep ResNet for both linear and nonlinear cases. With the powerful tool of free probability, we conduct an asymptotic analysis of the spectrum on the single-layer case, and then extend this analysis to the multi-layer case of an arbitrary number of layers. In particular, we propose to rescale the classical random initialization by the number of residual units, so that the spectrum has the order of $O(1)$, when compared with the large width and depth of the network. We empirically demonstrate that the proposed initialization scheme learns at a speed of orders of magnitudes faster than the classical ones, and thus attests a strong practical relevance of this investigation.

CVJan 16, 2018
An Accurate and Real-time Self-blast Glass Insulator Location Method Based On Faster R-CNN and U-net with Aerial Images

Zenan Ling, Robert C. Qiu, Zhijian Jin et al.

The location of broken insulators in aerial images is a challenging task. This paper, focusing on the self-blast glass insulator, proposes a deep learning solution. We address the broken insulators location problem as a low signal-noise-ratio image location framework with two modules: 1) object detection based on Fast R-CNN, and 2) classification of pixels based on U-net. A diverse aerial image set of some grid in China is tested to validated the proposed approach. Furthermore, a comparison is made among different methods and the result shows that our approach is accurate and real-time.

CVJun 20, 2017
Individual Recognition in Schizophrenia using Deep Learning Methods with Random Forest and Voting Classifiers: Insights from Resting State EEG Streams

Lei Chu, Robert Qiu, Haichun Liu et al.

Recently, there has been a growing interest in monitoring brain activity for individual recognition system. So far these works are mainly focussing on single channel data or fragment data collected by some advanced brain monitoring modalities. In this study we propose new individual recognition schemes based on spatio-temporal resting state Electroencephalography (EEG) data. Besides, instead of using features derived from artificially-designed procedures, modified deep learning architectures which aim to automatically extract an individual's unique features are developed to conduct classification. Our designed deep learning frameworks are proved of a small but consistent advantage of replacing the $softmax$ layer with Random Forest. Additionally, a voting layer is added at the top of designed neural networks in order to tackle the classification problem arisen from EEG streams. Lastly, various experiments are implemented to evaluate the performance of the designed deep learning architectures; Results indicate that the proposed EEG-based individual recognition scheme yields a high degree of classification accuracy: $81.6\%$ for characteristics in high risk (CHR) individuals, $96.7\%$ for clinically stable first episode patients with schizophrenia (FES) and $99.2\%$ for healthy controls (HC).