LGAug 12, 2022Code
USB: A Unified Semi-supervised Learning Benchmark for ClassificationYidong Wang, Hao Chen, Yue Fan et al. · cmu, pku
Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) for classification by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which we systematically evaluate the dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation of these SSL methods. We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning. USB enables the evaluation of a single SSL algorithm on more tasks from multiple domains but with less cost. Specifically, on a single NVIDIA V100, only 39 GPU days are required to evaluate FixMatch on 15 tasks in USB while 335 GPU days (279 GPU days on 4 CV datasets except for ImageNet) are needed on 5 CV tasks with TorchSSL.
NAJun 2
Linear Convergence of Parareal Algorithm for Semilinear Parabolic EquationsGuanglian Li, Qingle Lin, Shu-lin Wu et al.
Long-time simulations of evolution equations present substantial computational challenges due to the inherently sequential nature of conventional time-stepping schemes. The parareal method, a leading parallel-in-time (PinT) algorithm, offers a promising approach to overcome the challenge by introducing concurrency in the time domain. While its convergence theory is well-established for linear problems, extending the theory to nonlinear problems, particularly when the problem data have only limited regularity, remains a significant challenge. In this work, we provide the convergence analysis of the parareal algorithm for solving semilinear parabolic equations with an $H^2$ initial data. We employ stable rational approximations and first-order linearization as coarse propagators, establish the linear convergence of the parareal algorithm and provide a sharp estimate for the convergence factor. The analysis combines the error-splitting technique from the superlinear convergence analysis of the parareal method, a refined linear convergence theory for linear parabolic equations, and \textsl{a priori} error estimates that are optimal with respect to the regularity of the problem data. The analysis shows the close connection between the convergence behavior of nonlinear models and their linear counterparts. Numerical experiments fully support the theoretical findings.
NAMay 27
Convergence analysis of a parareal algorithm with multistep fine propagatorGeorgios Akrivis, Qingle Lin, Zhi Zhou
The parareal algorithm is a powerful parallel-in-time integration method that accelerates the numerical solution of evolution equations by iteratively combining a fine propagator and a coarse propagator. Although the convergence of the parareal algorithm has been extensively studied, most existing analyses assume that the fine propagator is either an exact solver or a single-step method. In this paper, we construct and analyze a parareal algorithm for solving parabolic equations, where the fine propagator is based on the two-step backward differentiation formula (BDF2), while the coarse propagator remains a single-step method. We propose a novel approach to design an effective correction for the initialization steps and establish linear convergence of the iteration. Numerical results fully support the theoretical findings, show clear improvements over existing multistep parareal strategies, and indicate that the proposed approach extends effectively to higher-order BDF methods and to nonlinear problems.
CVSep 18, 2023Code
Long-Tail Learning with Foundation Model: Heavy Fine-Tuning HurtsJiang-Xin Shi, Tong Wei, Zhi Zhou et al.
The fine-tuning paradigm in addressing long-tail learning tasks has sparked significant interest since the emergence of foundation models. Nonetheless, how fine-tuning impacts performance in long-tail learning was not explicitly quantified. In this paper, we disclose that heavy fine-tuning may even lead to non-negligible performance deterioration on tail classes, and lightweight fine-tuning is more effective. The reason is attributed to inconsistent class conditions caused by heavy fine-tuning. With the observation above, we develop a low-complexity and accurate long-tail learning algorithms LIFT with the goal of facilitating fast prediction and compact models by adaptive lightweight fine-tuning. Experiments clearly verify that both the training time and the learned parameters are significantly reduced with more accurate predictive performance compared with state-of-the-art approaches. The implementation code is available at https://github.com/shijxcs/LIFT.
LGMay 27
On the Learnability of Test-Time Adaptation: A Recovery Complexity PerspectiveZhi Zhou, Ming Yang, Shi-Yu Tian et al.
Test-time adaptation (TTA) aims to adapt models to maintain reliable performance on non-stationary test streams without requiring labeled data. Despite its empirical success, the learnability of TTA under non-stationary streams remains unexplored. A key challenge is the lack of a principled theoretical framework that simultaneously aligns with the TTA objective and captures both continuously evolving distribution shifts and intrinsic information constraints. To address this gap, we propose the first theoretical framework for studying the learnability of TTA and introduce $(ε,δ)$-Recovery Complexity and $(ε,ρ)$-TTA Learnability. Recovery complexity measures the post-shift time needed to maintain excess risk below a target level with high probability, and is further extended to TTA learnability, which measures the long-term reliability of TTA. Within this framework, we introduce a novel discrete surrogate for non-stationary test streams, enabling a unified and tractable analysis of both gradual and abrupt shifts. We derive order-wise matching lower and upper bounds on recovery complexity, revealing fundamental limits of TTA and an intrinsic adaptivity-information trade-off. These results provide unified learnability guarantees for TTA that complement regret-based analyses.
LGAug 9, 2022Code
LAMDA-SSL: Semi-Supervised Learning in PythonLin-Han Jia, Lan-Zhe Guo, Zhi Zhou et al.
LAMDA-SSL is open-sourced on GitHub and its detailed usage documentation is available at https://ygzwqzd.github.io/LAMDA-SSL/. This documentation introduces LAMDA-SSL in detail from various aspects and can be divided into four parts. The first part introduces the design idea, features and functions of LAMDA-SSL. The second part shows the usage of LAMDA-SSL by abundant examples in detail. The third part introduces all algorithms implemented by LAMDA-SSL to help users quickly understand and choose SSL algorithms. The fourth part shows the APIs of LAMDA-SSL. This detailed documentation greatly reduces the cost of familiarizing users with LAMDA-SSL toolkit and SSL algorithms.
NAApr 17, 2012
Error estimates for a semidiscrete finite element method for fractional order parabolic equationsBangti Jin, Raytcho Lazarov, Zhi Zhou
We consider the initial boundary value problem for the homogeneous time-fractional diffusion equation $\partial^α_t u - \De u =0$ ($0< α< 1$) with initial condition $u(x,0)=v(x)$ and a homogeneous Dirichlet boundary condition in a bounded polygonal domain $Ω$. We shall study two semidiscrete approximation schemes, i.e., Galerkin FEM and lumped mass Galerkin FEM, by using piecewise linear functions. We establish optimal with respect to the regularity of the solution error estimates, including the case of nonsmooth initial data, i.e., $v \in L_2(Ω)$.
NAMar 26, 2017
Correction of high-order BDF convolution quadrature for fractional evolution equationsBangti Jin, Buyang Li, Zhi Zhou
We develop proper correction formulas at the starting $k-1$ steps to restore the desired $k^{\rm th}$-order convergence rate of the $k$-step BDF convolution quadrature for discretizing evolution equations involving a fractional-order derivative in time. The desired $k^{\rm th}$-order convergence rate can be achieved even if the source term is not compatible with the initial data, which is allowed to be nonsmooth. We provide complete error estimates for the subdiffusion case $α\in (0,1)$, and sketch the proof for the diffusion-wave case $α\in(1,2)$. Extensive numerical examples are provided to illustrate the effectiveness of the proposed scheme.
NADec 2, 2017
Numerical analysis of nonlinear subdiffusion equationsBangti Jin, Buyang Li, Zhi Zhou
We present a general framework for the rigorous numerical analysis of time-fractional nonlinear parabolic partial differential equations, with a fractional derivative of order $α\in(0,1)$ in time. The framework relies on three technical tools: a fractional version of the discrete Grönwall-type inequality, discrete maximal regularity, and regularity theory of nonlinear equations. We establish a general criterion for showing the fractional discrete Grönwall inequality, and verify it for the L1 scheme and convolution quadrature generated by BDFs. Further, we provide a complete solution theory, e.g., existence, uniqueness and regularity, for a time-fractional diffusion equation with a Lipschitz nonlinear source term. Together with the known results of discrete maximal regularity, we derive pointwise $L^2(Ω)$ norm error estimates for semidiscrete Galerkin finite element solutions and fully discrete solutions, which are of order $O(h^2)$ (up to a logarithmic factor) and $O(τ^α)$, respectively, without any extra regularity assumption on the solution or compatibility condition on the problem data. The sharpness of the convergence rates is supported by the numerical experiments.
NAMay 29, 2018
Numerical methods for time-fractional evolution equations with nonsmooth data: a concise overviewBangti Jin, Raytcho Lazarov, Zhi Zhou
Over the past few decades, there has been substantial interest in evolution equations that involving a fractional-order derivative of order $α\in(0,1)$ in time, due to their many successful applications in engineering, physics, biology and finance. Thus, it is of paramount importance to develop and to analyze efficient and accurate numerical methods for reliably simulating such models, and the literature on the topic is vast and fast growing. The present paper gives a concise overview on numerical schemes for the subdiffusion model with nonsmooth problem data, which are important for the numerical analysis of many problems arising in optimal control, inverse problems and stochastic analysis. We focus on the following aspects of the subdiffusion model: regularity theory, Galerkin finite element discretization in space, time-stepping schemes (including convolution quadrature and L1 type schemes), and space-time variational formulations, and compare the results with that for standard parabolic problems. Further, these aspects are showcased with illustrative numerical experiments and complemented with perspectives and pointers to relevant literature.
NAFeb 27, 2017
An Analysis of the Crank-Nicolson Method for SubdiffusionBangti Jin, Buyang Li, Zhi Zhou
In this work, we analyze a Crank-Nicolson type time stepping scheme for the subdiffusion equation, which involves a Caputo fractional derivative of order $α\in (0,1)$ in time. It hybridizes the backward Euler convolution quadrature with a $θ$-type method, with the parameter $θ$ dependent on the fractional order $α$ by $θ=α/2$, and naturally generalizes the classical Crank-Nicolson method. We develop essential initial corrections at the starting two steps for the Crank-Nicolson scheme, and together with the Galerkin finite element method in space, obtain a fully discrete scheme. The overall scheme is easy to implement, and robust with respect to data regularity. A complete error analysis of the fully discrete scheme is provided, and a second-order accuracy in time is established for both smooth and nonsmooth problem data. Extensive numerical experiments are provided to illustrate its accuracy, efficiency and robustness, and a comparative study also indicates its competitive with existing schemes.
NAMar 29, 2017
Discrete maximal regularity of time-stepping schemes for fractional evolution equationsBangti Jin, Buyang Li, Zhi Zhou
In this work, we establish the maximal $\ell^p$-regularity for several time stepping schemes for a fractional evolution model, which involves a fractional derivative of order $α\in(0,2)$, $α\neq 1$, in time. These schemes include convolution quadratures generated by backward Euler method and second-order backward difference formula, the L1 scheme, explicit Euler method and a fractional variant of the Crank-Nicolson method. The main tools for the analysis include operator-valued Fourier multiplier theorem due to Weis [48] and its discrete analogue due to Blunck [10]. These results generalize the corresponding results for parabolic problems.
NADec 17, 2015
A Petrov-Galerkin Finite Element Method for Fractional Convection-Diffusion EquationsBangti Jin, Raytcho Lazarov, Zhi Zhou
In this work, we develop variational formulations of Petrov-Galerkin type for one-dimensional fractional boundary value problems involving either a Riemann-Liouville or Caputo derivative of order $α\in(3/2, 2)$ in the leading term and both convection and potential terms. They arise in the mathematical modeling of asymmetric super-diffusion processes in heterogeneous media. The well-posedness of the formulations and sharp regularity pickup of the variational solutions are established. A novel finite element method is developed, which employs continuous piecewise linear finite elements and "shifted" fractional powers for the trial and test space, respectively. The new approach has a number of distinct features: It allows deriving optimal error estimates in both $L^2(D)$ and $H^1(D)$ norms; and on a uniform mesh, the stiffness matrix of the leading term is diagonal and the resulting linear system is well conditioned. Further, in the Riemann-Liouville case, an enriched FEM is proposed to improve the convergence. Extensive numerical results are presented to verify the theoretical analysis and robustness of the numerical scheme.
NIJan 16, 2023
HiFlash: Communication-Efficient Hierarchical Federated Learning with Adaptive Staleness Control and Heterogeneity-aware Client-Edge AssociationQiong Wu, Xu Chen, Tao Ouyang et al.
Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients while keeping the training data locally. However, for many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN), leading to significant communication overhead and long transmission time. To mitigate the communication bottleneck, we resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing and combines synchronous client-edge model aggregation and asynchronous edge-cloud model aggregation together to greatly reduce the traffic volumes of WAN transmissions. Specifically, we first analyze the convergence bound of HiFL theoretically and identify the key controllable factors for model performance improvement. We then advocate an enhanced design of HiFlash by innovatively integrating deep reinforcement learning based adaptive staleness control and heterogeneity-aware client-edge association strategy to boost the system efficiency and mitigate the staleness effect without compromising model accuracy. Extensive experiments corroborate the superior performance of HiFlash in model accuracy, communication reduction, and system efficiency.
DCOct 31, 2022
GNN at the Edge: Cost-Efficient Graph Neural Network Processing over Distributed Edge ServersLiekang Zeng, Chongyu Yang, Peng Huang et al.
Edge intelligence has arisen as a promising computing paradigm for supporting miscellaneous smart applications that rely on machine learning techniques. While the community has extensively investigated multi-tier edge deployment for traditional deep learning models (e.g. CNNs, RNNs), the emerging Graph Neural Networks (GNNs) are still under exploration, presenting a stark disparity to its broad edge adoptions such as traffic flow forecasting and location-based social recommendation. To bridge this gap, this paper formally studies the cost optimization for distributed GNN processing over a multi-tier heterogeneous edge network. We build a comprehensive modeling framework that can capture a variety of different cost factors, based on which we formulate a cost-efficient graph layout optimization problem that is proved to be NP-hard. Instead of trivially applying traditional data placement wisdom, we theoretically reveal the structural property of quadratic submodularity implicated in GNN's unique computing pattern, which motivates our design of an efficient iterative solution exploiting graph cuts. Rigorous analysis shows that it provides parameterized constant approximation ratio, guaranteed convergence, and exact feasibility. To tackle potential graph topological evolution in GNN processing, we further devise an incremental update strategy and an adaptive scheduling algorithm for lightweight dynamic layout optimization. Evaluations with real-world datasets and various GNN benchmarks demonstrate that our approach achieves superior performance over de facto baselines with more than 95.8% cost eduction in a fast convergence speed.
NADec 21, 2017
Pointwise-in-time error estimates for an optimal control problem with subdiffusion constraintBangti Jin, Buyang Li, Zhi Zhou
In this work, we present numerical analysis for a distributed optimal control problem, with box constraint on the control, governed by a subdiffusion equation which involves a fractional derivative of order $α\in(0,1)$ in time. The fully discrete scheme is obtained by applying the conforming linear Galerkin finite element method in space, L1 scheme/backward Euler convolution quadrature in time, and the control variable by a variational type discretization. With a space mesh size $h$ and time stepsize $τ$, we establish the following order of convergence for the numerical solutions of the optimal control problem: $O(τ^{\min({1}/{2}+α-ε,1)}+h^2)$ in the discrete $L^2(0,T;L^2(Ω))$ norm and $O(τ^{α-ε}+\ell_h^2h^2)$ in the discrete $L^\infty(0,T;L^2(Ω))$ norm, with any small $ε>0$ and $\ell_h=\ln(2+1/h)$. The analysis relies essentially on the maximal $L^p$-regularity and its discrete analogue for the subdiffusion problem. Numerical experiments are provided to support the theoretical results.
LGApr 22, 2023
Towards Carbon-Neutral Edge Computing: Greening Edge AI by Harnessing Spot and Future Carbon MarketsHuirong Ma, Zhi Zhou, Xiaoxi Zhang et al.
Provisioning dynamic machine learning (ML) inference as a service for artificial intelligence (AI) applications of edge devices faces many challenges, including the trade-off among accuracy loss, carbon emission, and unknown future costs. Besides, many governments are launching carbon emission rights (CER) for operators to reduce carbon emissions further to reverse climate change. Facing these challenges, to achieve carbon-aware ML task offloading under limited carbon emission rights thus to achieve green edge AI, we establish a joint ML task offloading and CER purchasing problem, intending to minimize the accuracy loss under the long-term time-averaged cost budget of purchasing the required CER. However, considering the uncertainty of the resource prices, the CER purchasing prices, the carbon intensity of sites, and ML tasks' arrivals, it is hard to decide the optimal policy online over a long-running period time. To overcome this difficulty, we leverage the two-timescale Lyapunov optimization technique, of which the $T$-slot drift-plus-penalty methodology inspires us to propose an online algorithm that purchases CER in multiple timescales (on-preserved in carbon future market and on-demanded in the carbon spot market) and makes decisions about where to offload ML tasks. Considering the NP-hardness of the $T$-slot problems, we further propose the resource-restricted randomized dependent rounding algorithm to help to gain the near-optimal solution with no help of any future information. Our theoretical analysis and extensive simulation results driven by the real carbon intensity trace show the superior performance of the proposed algorithms.
NAFeb 1, 2016
An Analysis of Galerkin Proper Orthogonal Decomposition for SubdiffusionBangti Jin, Zhi Zhou
In this work, we develop a novel Galerkin-L1-POD scheme for the subdiffusion model with a Caputo fractional derivative of order $α\in (0,1)$ in time, which is often used to describe anomalous diffusion processes in heterogeneous media. The nonlocality of the fractional derivative requires storing all the solutions from time zero. The proposed scheme is based on continuous piecewise linear finite elements, L1 time stepping, and proper orthogonal decomposition (POD). By constructing an effective reduced-order scheme using problem-adapted basis functions, it can significantly reduce the computational complexity and storage requirement. We shall provide a complete error analysis of the scheme under realistic regularity assumptions by means of a novel energy argument. Extensive numerical experiments are presented to verify the convergence analysis and the efficiency of the proposed scheme.
NAApr 18, 2023
Electrical Impedance Tomography with Deep Calderón MethodSiyu Cen, Bangti Jin, Kwancheol Shin et al.
Electrical impedance tomography (EIT) is a noninvasive medical imaging modality utilizing the current-density/voltage data measured on the surface of the subject. Calderón's method is a relatively recent EIT imaging algorithm that is non-iterative, fast, and capable of reconstructing complex-valued electric impedances. However, due to the regularization via low-pass filtering and linearization, the reconstructed images suffer from severe blurring and under-estimation of the exact conductivity values. In this work, we develop an enhanced version of Calderón's method, using {deep} convolution neural networks (i.e., U-net) {as an effective targeted post-processing step, and term the resulting method by deep Calderón's method.} Specifically, we learn a U-net to postprocess the EIT images generated by Calderón's method so as to have better resolutions and more accurate estimates of conductivity values. We simulate chest configurations with which we generate the current-density/voltage boundary measurements and the corresponding reconstructed images by Calderón's method. With the paired training data, we learn the deep neural network and evaluate its performance on real tank measurement data. The experimental results indicate that the proposed approach indeed provides a fast and direct (complex-valued) impedance tomography imaging technique, and substantially improves the capability of the standard Calderón's method.
NAMar 12, 2013
Galerkin FEM for fractional order parabolic equations with initial data in $H^{-s},~0 < s \le 1$Bangti Jin, Raytcho Lazarov, Joseph Pasciak et al.
We investigate semi-discrete numerical schemes based on the standard Galerkin and lumped mass Galerkin finite element methods for an initial-boundary value problem for homogeneous fractional diffusion problems with non-smooth initial data. We assume that $Ω\subset \mathbb{R}^d$, $d=1,2,3$ is a convex polygonal (polyhedral) domain. We theoretically justify optimal order error estimates in $L_2$- and $H^1$-norms for initial data in $H^{-s}(Ω),~0\le s \le 1$. We confirm our theoretical findings with a number of numerical tests that include initial data $v$ being a Dirac $δ$-function supported on a $(d-1)$-dimensional manifold.
DCJul 4, 2023
Serving Graph Neural Networks With Distributed Fog Servers For Smart IoT ServicesLiekang Zeng, Xu Chen, Peng Huang et al.
Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures. To render GNN-based service for IoT-driven smart applications, traditional model serving paradigms usually resort to the cloud by fully uploading geo-distributed input data to remote datacenters. However, our empirical measurements reveal the significant communication overhead of such cloud-based serving and highlight the profound potential in applying the emerging fog computing. To maximize the architectural benefits brought by fog computing, in this paper, we present Fograph, a novel distributed real-time GNN inference framework that leverages diverse and dynamic resources of multiple fog nodes in proximity to IoT data sources. By introducing heterogeneity-aware execution planning and GNN-specific compression techniques, Fograph tailors its design to well accommodate the unique characteristics of GNN serving in fog environments. Prototype-based evaluation and case study demonstrate that Fograph significantly outperforms the state-of-the-art cloud serving and fog deployment by up to 5.39x execution speedup and 6.84x throughput improvement.
LGOct 20, 2023
DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming DatasetWeijie Liu, Xiaoxi Zhang, Jingpu Duan et al.
Federated Learning (FL) is a distributed learning paradigm that can coordinate heterogeneous edge devices to perform model training without sharing private data. While prior works have focused on analyzing FL convergence with respect to hyperparameters like batch size and aggregation frequency, the joint effects of adjusting these parameters on model performance, training time, and resource consumption have been overlooked, especially when facing dynamic data streams and network characteristics. This paper introduces novel analytical models and optimization algorithms that leverage the interplay between batch size and aggregation frequency to navigate the trade-offs among convergence, cost, and completion time for dynamic FL training. We establish a new convergence bound for training error considering heterogeneous datasets across devices and derive closed-form solutions for co-optimized batch size and aggregation frequency that are consistent across all devices. Additionally, we design an efficient algorithm for assigning different batch configurations across devices, improving model accuracy and addressing the heterogeneity of both data and system characteristics. Further, we propose an adaptive control algorithm that dynamically estimates network states, efficiently samples appropriate data batches, and effectively adjusts batch sizes and aggregation frequency on the fly. Extensive experiments demonstrate the superiority of our offline optimal solutions and online adaptive algorithm.
OCAug 23, 2023
Solving Elliptic Optimal Control Problems via Neural Networks and Optimality SystemYongcheng Dai, Bangti Jin, Ramesh Sau et al.
In this work, we investigate a neural network based solver for optimal control problems (without / with box constraint) for linear and semilinear second-order elliptic problems. It utilizes a coupled system derived from the first-order optimality system of the optimal control problem, and employs deep neural networks to represent the solutions to the reduced system. We present an error analysis of the scheme, and provide $L^2(Ω)$ error bounds on the state, control and adjoint in terms of neural network parameters (e.g., depth, width, and parameter bounds) and the numbers of sampling points. The main tools in the analysis include offset Rademacher complexity and boundedness and Lipschitz continuity of neural network functions. We present several numerical examples to illustrate the method and compare it with two existing ones.
NASep 7, 2022
Solving Elliptic Problems with Singular Sources using Singularity Splitting Deep Ritz MethodTianhao Hu, Bangti Jin, Zhi Zhou
In this work, we develop an efficient solver based on neural networks for second-order elliptic equations with variable coefficients and singular sources. This class of problems covers general point sources, line sources and the combination of point-line sources, and has a broad range of practical applications. The proposed approach is based on decomposing the true solution into a singular part that is known analytically using the fundamental solution of the Laplace equation and a regular part that satisfies a suitable modified elliptic PDE with a smoother source, and then solving for the regular part using the deep Ritz method. A path-following strategy is suggested to select the penalty parameter for enforcing the Dirichlet boundary condition. Extensive numerical experiments in two- and multi-dimensional spaces with point sources, line sources or their combinations are presented to illustrate the efficiency of the proposed approach, and a comparative study with several existing approaches based on neural networks is also given, which shows clearly its competitiveness for the specific class of problems. In addition, we briefly discuss the error analysis of the approach.
NAMar 29, 2023
Conductivity Imaging from Internal Measurements with Mixed Least-Squares Deep Neural NetworksBangti Jin, Xiyao Li, Qimeng Quan et al.
In this work we develop a novel approach using deep neural networks to reconstruct the conductivity distribution in elliptic problems from one measurement of the solution over the whole domain. The approach is based on a mixed reformulation of the governing equation and utilizes the standard least-squares objective, with deep neural networks as ansatz functions to approximate the conductivity and flux simultaneously. We provide a thorough analysis of the deep neural network approximations of the conductivity for both continuous and empirical losses, including rigorous error estimates that are explicit in terms of the noise level, various penalty parameters and neural network architectural parameters (depth, width and parameter bound). We also provide multiple numerical experiments in two- and multi-dimensions to illustrate distinct features of the approach, e.g., excellent stability with respect to data noise and capability of solving high-dimensional problems.
NAMay 27
Dual Variational Neural Network for the $p$-Laplace ProblemTianhao Hu, Guanglian Li, Fengru Wang et al.
The reliable and accurate numerical approximation of the $p$-Laplacian is particularly challenging in the extreme regimes $p \to 1^{+}$ and $p \gg 1$, where the operator becomes either highly singular or strongly degenerate, often causing severe instability in standard numerical methods. To address these difficulties, we propose a novel deep learning based framework, termed the dual variational neural network, for $p$-Laplace problems. The approach is based on a mixed formulation and an $L^q$-based Helmholtz decomposition, which decouples the original problem into two convex subproblems: a linear Poisson problem for the irrotational component and an unconstrained minimization problem over divergence-free fields for the solenoidal component. Following the decomposition, we employ two neural networks using a gradient--curl representation to approximate the flux, and further establish an error analysis of the neural approximation. The analysis relies on fundamental vector inequalities together with tools from statistical learning theory. Numerical experiments demonstrate robust convergence of the proposed method in challenging settings, including the extreme cases $p \to 1^{+}$ and $p \gg 1$, as well as the $p(x)$-Laplace equation.
CLMar 25Code
Thinking with Tables: Enhancing Multi-Modal Tabular Understanding via Neuro-Symbolic ReasoningKun-Yang Yu, Zhi Zhou, Shi-Yu Tian et al.
Multimodal Large Language Models (MLLMs) have demonstrated remarkable reasoning capabilities across modalities such as images and text. However, tabular data, despite being a critical real-world modality, remains relatively underexplored in multimodal learning. In this paper, we focus on the task of Tabular-Vision Multi-Modal Understanding (TVMU) and identify three core challenges: (1) high structural variability and data incompleteness in tables, (2) implicit and complex feature dependencies, and (3) significant heterogeneity in problem-solving pipelines across downstream tasks. To address these issues, we propose Thinking with Tables (TWT). TWT employs a program-aided code-based neuro-symbolic reasoning mechanism that facilitates key operations, such as information extraction and element modeling, by interacting with external environments. We evaluate TWT on eight representative datasets. Experimental results demonstrate that TWT consistently outperforms existing baselines by an average of 10\% in accuracy, achieving performance comparable to, or even surpassing, proprietary commercial SOTA LLMs on TVMU tasks. Models and codes are available at https://github.com/kunyang-YU/Thinking-with-Tables
NAAug 17, 2024
Point Source Identification Using Singularity Enriched Neural NetworksTianhao Hu, Bangti Jin, Zhi Zhou
The inverse problem of recovering point sources represents an important class of applied inverse problems. However, there is still a lack of neural network-based methods for point source identification, mainly due to the inherent solution singularity. In this work, we develop a novel algorithm to identify point sources, utilizing a neural network combined with a singularity enrichment technique. We employ the fundamental solution and neural networks to represent the singular and regular parts, respectively, and then minimize an empirical loss involving the intensities and locations of the unknown point sources, as well as the parameters of the neural network. Moreover, by combining the conditional stability argument of the inverse problem with the generalization error of the empirical loss, we conduct a rigorous error analysis of the algorithm. We demonstrate the effectiveness of the method with several challenging experiments.
CVMay 3Code
VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal LearningZi-Yi Jia, Zi-Jian Cheng, Xin-Yue Zhang et al.
Multi-model learning has attracted great attention in visual-text tasks. However, visual-tabular data, which plays a pivotal role in high-stakes domains like healthcare and industry, remains underexplored. In this paper, we introduce \textit{VT-Bench}, the first unified benchmark for standardizing vision-tabular discriminative prediction and generative reasoning tasks. VT-Bench aggregates 14 datasets across 9 domains (medical-centric, while covering pets, media, and transportation) with over 756K samples. We evaluate 23 representative models, including unimodal experts, specialized visual-tabular models, general-purpose vision-language models (VLMs), and tool-augmented methods, highlighting substantial challenges of visual-tabular learning. We believe VT-Bench will stimulate the community to build more powerful multi-modal vision-tabular foundation models. Benchmark: https://github.com/Ziyi-Jia990/VT-Bench
AIAug 21, 2024
Enabling Small Models for Zero-Shot Selection and Reuse through Model Label LearningJia Zhang, Zhi Zhou, Lan-Zhe Guo et al.
Vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot ability in image classification tasks by aligning text and images but suffer inferior performance compared with task-specific expert models. On the contrary, expert models excel in their specialized domains but lack zero-shot ability for new tasks. How to obtain both the high performance of expert models and zero-shot ability is an important research direction. In this paper, we attempt to demonstrate that by constructing a model hub and aligning models with their functionalities using model labels, new tasks can be solved in a zero-shot manner by effectively selecting and reusing models in the hub. We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities through a Semantic Directed Acyclic Graph (SDAG) and leverages an algorithm, Classification Head Combination Optimization (CHCO), to select capable models for new tasks. Compared with the foundation model paradigm, it is less costly and more scalable, i.e., the zero-shot ability grows with the sizes of the model hub. Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL, demonstrating that expert models can be effectively reused for zero-shot tasks. Our code will be released publicly.
NAMay 7
Numerical Analysis of Space-Time Dependent Source Identification in Subdiffusion EquationsSiyu Cen, Bangti Jin, Yavar Kian et al.
In this work, we propose an easy-to-implement fixed-point algorithm for reconstructing a space-time dependent source in a subdiffusion model from lateral boundary measurements. The numerical scheme combines a Galerkin finite element method for spatial discretization with a finite difference method for temporal discretization. We establish the linear convergence of the fixed-point iteration and derive an error bound that depends explicitly on the discretization parameters and the noise level. The error analysis relies on stability properties of the continuous inverse problem and technical estimates for the associated direct problem with limited-regularity data. Numerical experiments are presented to support and complement the theoretical analysis.
GRApr 14
Neural Dynamic GI: Random-Access Neural Compression for Temporal Lightmaps in Dynamic Lighting EnvironmentsJianhui Wu, Jian Zhou, Zhi Zhou et al.
High-quality global illumination (GI) in real-time rendering is commonly achieved using precomputed lighting techniques, with lightmap as the standard choice. To support GI for static objects in dynamic lighting environments, multiple lightmaps at different lighting conditions need to be precomputed, which incurs substantial storage and memory overhead. To overcome this limitation, we propose Neural Dynamic GI (NDGI), a novel compression technique specifically designed for temporal lightmap sets. Our method utilizes multi-dimensional feature maps and lightweight neural networks to integrate the temporal information instead of storing multiple sets explicitly, which significantly reduces the storage size of lightmaps. Additionally, we introduce a block compression (BC) simulation strategy during the training process, which enables BC compression on the final generated feature maps and further improves the compression ratio. To enable efficient real-time decompression, we also integrate a virtual texturing (VT) system with our neural representation. Compared with prior methods, our approach achieves high-quality dynamic GI while maintaining remarkably low storage and memory requirements, with only modest real-time decompression overhead. To facilitate further research in this direction, we will release our temporal lightmap dataset precomputed in multiple scenes featuring diverse temporal variations.
CVApr 8
LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language ModelsShi-Yu Tian, Zhi Zhou, Kun-Yang Yu et al.
Spatial reasoning is a cornerstone capability for intelligent systems to perceive and interact with the physical world. However, multimodal large language models (MLLMs) frequently suffer from hallucinations and imprecision when parsing complex geometric layouts. As data-driven scaling struggles to internalize structured geometric priors and spatial constraints, integrating mature, specialized vision models presents a compelling alternative. Despite its promise, applying this paradigm to spatial reasoning is hindered by two key challenges: The difficulty of invoking heterogeneous, parameter-rich tools, as well as the challenge of understanding and effectively leveraging their diverse low-level outputs (e.g., segmentation masks, depth maps) in high-level reasoning. To address these challenges, we propose LAST, a unified framework for tool-augmented spatial reasoning. LAST features an extensible interactive sandbox, termed LAST-Box, which abstracts heterogeneous tool invocations into atomic instructions and reusable spatial skills, returning multimodal hints (e.g., annotated images and textual descriptions) that can be directly consumed by LLMs. We further design a three-stage progressive training strategy that guides models from understanding tool outputs to proficient and adaptive tool invocation. Experiments on four datasets show that LAST-7B achieves around 20\% performance gains over its backbone and outperforms strong proprietary closed-source LLMs, substantially enhancing reasoning on complex spatial tasks.
CLFeb 10, 2025Code
LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLMZhi Zhou, Kun-Yang Yu, Shi-Yu Tian et al.
Large language models (LLMs), both proprietary and open-source, have demonstrated remarkable capabilities across various natural language processing tasks. However, they face significant limitations in legal reasoning tasks. Proprietary models introduce data privacy risks and high inference costs, while open-source models underperform due to insufficient legal domain training data. To address these limitations, we study data generation for legal reasoning to improve the legal reasoning performance of open-source LLMs with the help of proprietary LLMs. This is challenging due to the lack of legal knowledge in proprietary LLMs and the difficulty in verifying the generated data. We propose KgDG, a knowledge-guided data generation framework for legal reasoning. Our framework enables leveraging legal knowledge to enhance generation diversity and introduces a refinement and verification process to ensure the quality of generated data. Moreover, we expand the generated dataset to further enhance the LLM reasoning capabilities. Using KgDG, we create a synthetic legal reasoning dataset containing 50K high-quality examples. Our trained model LawGPT outperforms existing legal-specific LLMs and achieves performance comparable to proprietary LLMs, demonstrating the effectiveness of KgDG and LawGPT. Our code and resources is publicly available at https://github.com/LAMDASZ-ML/Knowledge-Guide-Data-Generation .
AIMar 17
NeSy-Route: A Neuro-Symbolic Benchmark for Constrained Route Planning in Remote SensingMing Yang, Zhi Zhou, Shi-Yu Tian et al.
Remote sensing underpins crucial applications such as disaster relief and ecological field surveys, where systems must understand complex scenes and constraints and make reliable decisions. Current remote-sensing benchmarks mainly focus on evaluating perception and reasoning capabilities of multimodal large language models (MLLMs). They fail to assess planning capability, stemming either from the difficulty of curating and validating planning tasks at scale or from evaluation protocols that are inaccurate and inadequate. To address these limitations, we introduce NeSy-Route, a large-scale neuro-symbolic benchmark for constrained route planning in remote sensing. Within this benchmark, we introduce an automated data-generation framework that integrates high-fidelity semantic masks with heuristic search to produce diverse route-planning tasks with provably optimal solutions. This allows NeSy-Route to comprehensively evaluate planning across 10,821 route-planning samples, nearly 10 times larger than the largest prior benchmark. Furthermore, a three-level hierarchical neuro-symbolic evaluation protocol is developed to enable accurate assessment and support fine-grained analysis on perception, reasoning, and planning simultaneously. Our comprehensive evaluation of various state-of-the-art MLLMs demonstrates that existing MLLMs show significant deficiencies in perception and planning capabilities. We hope NeSy-Route can support further research and development of more powerful MLLMs for remote sensing.
AIAug 19, 2025Code
Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language ModelsXiao-Wen Yang, Jie-Jing Shao, Lan-Zhe Guo et al.
Large Language Models (LLMs) have shown promising results across various tasks, yet their reasoning capabilities remain a fundamental challenge. Developing AI systems with strong reasoning capabilities is regarded as a crucial milestone in the pursuit of Artificial General Intelligence (AGI) and has garnered considerable attention from both academia and industry. Various techniques have been explored to enhance the reasoning capabilities of LLMs, with neuro-symbolic approaches being a particularly promising way. This paper comprehensively reviews recent developments in neuro-symbolic approaches for enhancing LLM reasoning. We first present a formalization of reasoning tasks and give a brief introduction to the neurosymbolic learning paradigm. Then, we discuss neuro-symbolic methods for improving the reasoning capabilities of LLMs from three perspectives: Symbolic->LLM, LLM->Symbolic, and LLM+Symbolic. Finally, we discuss several key challenges and promising future directions. We have also released a GitHub repository including papers and resources related to this survey: https://github.com/LAMDASZ-ML/Awesome-LLM-Reasoning-with-NeSy.
LGJan 31, 2025Code
TabFSBench: Tabular Benchmark for Feature Shifts in Open EnvironmentsZi-Jian Cheng, Zi-Yi Jia, Zhi Zhou et al.
Tabular data is widely utilized in various machine learning tasks. Current tabular learning research predominantly focuses on closed environments, while in real-world applications, open environments are often encountered, where distribution and feature shifts occur, leading to significant degradation in model performance. Previous research has primarily concentrated on mitigating distribution shifts, whereas feature shifts, a distinctive and unexplored challenge of tabular data, have garnered limited attention. To this end, this paper conducts the first comprehensive study on feature shifts in tabular data and introduces the first tabular feature-shift benchmark (TabFSBench). TabFSBench evaluates impacts of four distinct feature-shift scenarios on four tabular model categories across various datasets and assesses the performance of large language models (LLMs) and tabular LLMs in the tabular benchmark for the first time. Our study demonstrates three main observations: (1) most tabular models have the limited applicability in feature-shift scenarios; (2) the shifted feature set importance has a linear relationship with model performance degradation; (3) model performance in closed environments correlates with feature-shift performance. Future research direction is also explored for each observation. Benchmark: https://github.com/LAMDASZ-ML/TabFSBench.
LGJan 30, 2025Code
Vision-Language Model Selection and Reuse for Downstream AdaptationHao-Zhe Tan, Zhi Zhou, Yu-Feng Li et al.
Pre-trained Vision-Language Models (VLMs) are becoming increasingly popular across various visual tasks, and several open-sourced VLM variants have been released. However, selecting the best-performing pre-trained VLM for a specific downstream task is challenging since no single VLM can achieve promising performance on all downstream tasks, and evaluating all available VLMs is impossible due to time and data limitations. To address this problem, this paper proposes a novel paradigm to select and reuse VLM for downstream tasks, called Model Label Learning (MLL). The proposal contains three key modules: \emph{model labeling}, which assigns labels to each VLM to describe their specialty and utility; \emph{model selection}, which matches the requirements of the target task with model labels; and \emph{model reuse}, which applies selected VLMs to the target task in an ensemble manner. The proposal is highly computationally efficient and growable since the model labeling process is completed target task independent and the ability could grow with the number of candidate VLMs. We also introduce a new benchmark for evaluating VLM selection methods, including 49 VLMs and 17 target task datasets. Experimental results clearly demonstrate the effectiveness of the proposed method for selecting and reusing VLMs.
NAMay 12
Optimized Two-Step Coarse Propagators in Parareal AlgorithmsGuanglian Li, Qingle Lin, Kai Zhang et al.
In this work, we propose a novel framework for accelerating the parareal algorithm, in which the coarse propagator is formulated as a two-step method and optimized with respect to the convergence factor.} We derive a rigorous error estimate for the proposed two-step parareal algorithm, yielding an explicit bound on the linear convergence factor. This estimate is not only of theoretical interest: it provides a quantitative guideline for selecting and designing coarse propagators. Guided by this estimate, we {consider the linear parabolic equation as an illustrative example and }construct an optimized two-step coarse propagator~(O2CP) that delivers very fast convergence in practice. The resulting method attains an optimized convergence factor of approximately $0.0064$, substantially smaller than that of commonly used practical coarse propagators in the classical parareal setting, while keeping the computational cost moderate. Numerical experiments on linear and nonlinear parabolic equations fully support the theoretical analysis and demonstrate rapid convergence of the two-step parareal algorithm equipped with the O2CP.
SYAug 20, 2025
Smart Charging Impact Analysis using Clustering Methods and Real-world Distribution FeedersRavi Raj Shrestha, Zhi Zhou, Limon Barua et al.
The anticipated widespread adoption of electric vehicles (EVs) necessitates a critical evaluation of existing power distribution infrastructures, as EV integration imposes additional stress on distribution networks that can lead to component overloading and power quality degradation. Implementing smart charging mechanisms can mitigate these adverse effects and defer or even avoid upgrades. This study assesses the performance of two smart charging strategies - Time of Use (TOU) pricing and Load Balancing (LB) on seven representative real-world feeders identified using k-means clustering. A time series-based steady-state load flow analysis was conducted on these feeders to simulate the impact of EV charging under both strategies across four different EV enrollment scenarios and three representative days to capture seasonal load characteristics. A grid upgrade strategy has been proposed to strengthen the power grid to support EV integration with minimal cost. Results demonstrate that both TOU and LB strategies effectively manage the additional EV load with reduced upgrade requirement and cost to existing infrastructure compared to the case without smart charging strategies and LB outperforms TOU when the customer enrollment levels are high. These findings support the viability of smart charging in facilitating EV integration while maintaining distribution network reliability and reducing investment cost.
LGFeb 9
Kirin: Improving ANN efficiency with SNN HybridizationChenyu Wang, Zhanglu Yan, Zhi Zhou et al.
Artificial neural networks (ANNs), particularly large language models (LLMs), demonstrate powerful inference capabilities but consume substantial energy. Conversely, spiking neural networks (SNNs) exhibit exceptional energy efficiency due to their binary and event-driven characteristics, thus motivating the study of ANN-to-SNN conversion. In this process, quantization plays a pivotal role, mapping LLMs' floating-point parameters to discrete SNN parameters via the temporal dimension of the time window. However, several challenges remain in the conversion process: (i) converting high bit-width quantization values into binary spikes requires longer time windows, increasing system latency; and (ii) the inherent trade-off between the information loss of single-spike schemes and the energy costs of multi-spike ones in SNN. To address these challenges, we propose Kirin, a integer and spike hybrid based SNN to achieve accuracy lossless ANN-to-SNN conversion with time and energy efficiency. Specifically, we first propose a Spike Matrix Hybridization strategy that encoding low bit-width parameters that leading to small time window size into binary spikes while preserving the rest in integer format, thereby reducing the overall latency of SNN execution. Second, we introduce a silence threshold mechanism to regulate the timing of single-spike firing, ensuring the output is mathematically equivalent to the LLM's output and preserves accuracy. Experimental results demonstrate that Kirin, under a W4A4\&8 quantization setting, achieves near-FP16 accuracy while reducing energy consumption by up to 84.66\% and shortening time steps by 93.75\%.
CLJun 7, 2024Code
LawGPT: A Chinese Legal Knowledge-Enhanced Large Language ModelZhi Zhou, Jiang-Xin Shi, Peng-Xiao Song et al.
Large language models (LLMs), including both proprietary and open-source models, have showcased remarkable capabilities in addressing a wide range of downstream tasks. Nonetheless, when it comes to practical Chinese legal tasks, these models fail to meet the actual requirements. Proprietary models do not ensure data privacy for sensitive legal cases, while open-source models demonstrate unsatisfactory performance due to their lack of legal knowledge. To address this problem, we introduce LawGPT, the first open-source model specifically designed for Chinese legal applications. LawGPT comprises two key components: legal-oriented pre-training and legal supervised fine-tuning. Specifically, we employ large-scale Chinese legal documents for legal-oriented pre-training to incorporate legal domain knowledge. To further improve the model's performance on downstream legal tasks, we create a knowledge-driven instruction dataset for legal supervised fine-tuning. Our experimental results demonstrate that LawGPT outperforms the open-source LLaMA 7B model. Our code and resources are publicly available at https://github.com/pengxiao-song/LaWGPT and have received 5.7K stars on GitHub.
DCDec 16, 2023Code
Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUsAodong Chen, Fei Xu, Li Han et al.
GPUs have become the \emph{defacto} hardware devices for accelerating Deep Neural Network (DNN) inference workloads. However, the conventional \emph{sequential execution mode of DNN operators} in mainstream deep learning frameworks cannot fully utilize GPU resources, even with the operator fusion enabled, due to the increasing complexity of model structures and a greater diversity of operators. Moreover, the \emph{inadequate operator launch order} in parallelized execution scenarios can lead to GPU resource wastage and unexpected performance interference among operators. In this paper, we propose \emph{Opara}, a resource- and interference-aware DNN \underline{Op}erator \underline{para}llel scheduling framework to accelerate DNN inference on GPUs. Specifically, \emph{Opara} first employs \texttt{CUDA Streams} and \texttt{CUDA Graph} to \emph{parallelize} the execution of multiple operators automatically. To further expedite DNN inference, \emph{Opara} leverages the resource demands of operators to judiciously adjust the operator launch order on GPUs, overlapping the execution of compute-intensive and memory-intensive operators. We implement and open source a prototype of \emph{Opara} based on PyTorch in a \emph{non-intrusive} manner. Extensive prototype experiments with representative DNN and Transformer-based models demonstrate that \emph{Opara} outperforms the default sequential \texttt{CUDA Graph} in PyTorch and the state-of-the-art operator parallelism systems by up to $1.68\times$ and $1.29\times$, respectively, yet with acceptable runtime overhead.
AIDec 6, 2024
Neuro-Symbolic Data Generation for Math ReasoningZenan Li, Zhi Zhou, Yuan Yao et al. · microsoft-research
A critical question about Large Language Models (LLMs) is whether their apparent deficiency in mathematical reasoning is inherent, or merely a result of insufficient exposure to high-quality mathematical data. To explore this, we developed an automated method for generating high-quality, supervised mathematical datasets. The method carefully mutates existing math problems, ensuring both diversity and validity of the newly generated problems. This is achieved by a neuro-symbolic data generation framework combining the intuitive informalization strengths of LLMs, and the precise symbolic reasoning of math solvers along with projected Markov chain Monte Carlo sampling in the highly-irregular symbolic space. Empirical experiments demonstrate the high quality of data generated by the proposed method, and that the LLMs, specifically LLaMA-2 and Mistral, when realigned with the generated data, surpass their state-of-the-art counterparts.
CLFeb 6, 2025
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language ModelsXiao-Wen Yang, Xuan-Yi Zhu, Wen-Da Wei et al.
The integration of slow-thinking mechanisms into large language models (LLMs) offers a promising way toward achieving Level 2 AGI Reasoners, as exemplified by systems like OpenAI's o1. However, several significant challenges remain, including inefficient overthinking and an overreliance on auxiliary reward models. We point out that these limitations stem from LLMs' inability to internalize the search process, a key component of effective reasoning. A critical step toward addressing this issue is enabling LLMs to autonomously determine when and where to backtrack, a fundamental operation in traditional search algorithms. To this end, we propose a self-backtracking mechanism that equips LLMs with the ability to backtrack during both training and inference. This mechanism not only enhances reasoning ability but also efficiency by transforming slow-thinking processes into fast-thinking through self-improvement. Empirical evaluations demonstrate that our proposal significantly enhances the reasoning capabilities of LLMs, achieving a performance gain of over 40 percent compared to the optimal-path supervised fine-tuning method. We believe this study introduces a novel and promising pathway for developing more advanced and robust Reasoners.
DCMar 26, 2025
Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention DisaggregationYunkai Liang, Zhangyu Chen, Pengfei Zuo et al.
In large language model (LLM) serving systems, executing each request consists of two phases: the compute-intensive prefill phase and the memory-intensive decoding phase. To prevent performance interference between the two phases, current LLM serving systems typically adopt prefill-decoding disaggregation, where the two phases are split across separate machines. However, we observe this approach leads to significant resource underutilization. Specifically, prefill instances that are compute-intensive suffer from low memory utilization, while decoding instances that are memory-intensive experience low compute utilization. To address this problem, this paper proposes Adrenaline, an attention disaggregation and offloading mechanism designed to enhance resource utilization and performance in LLM serving systems. Adrenaline's key innovation lies in disaggregating part of the attention computation in the decoding phase and offloading them to prefill instances. The memory-bound nature of decoding-phase attention computation inherently enables an effective offloading strategy, yielding two complementary advantages: 1) improved memory capacity and bandwidth utilization in prefill instances, and 2) increased decoding batch sizes that enhance compute utilization in decoding instances, collectively boosting overall system performance. Adrenaline achieves these gains through three key techniques: low-latency decoding synchronization, resource-efficient prefill colocation, and load-aware offloading scheduling. Experimental results show that Adrenaline achieves 2.28x higher memory capacity and 2.07x better memory bandwidth utilization in prefill instances, up to 1.67x improvements in compute utilization for decoding instances, and 1.68x higher overall inference throughput compared to state-of-the-art systems.
LGFeb 1, 2025
Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM ReasoningZhi Zhou, Tan Yuhao, Zenan Li et al.
Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning capabilities. However, single-shot inference often yields unreliable results for complex reasoning tasks, leading researchers to explore multiple reasoning paths through methods such as perplexity and self-consistency. In this paper, we present the first theoretical error decomposition analysis of these techniques, breaking down their error into estimation error and model error. Our analysis reveals a fundamental trade-off: perplexity methods suffer from substantial model error due to the absence of a proper consistency function, while self-consistency exhibits high estimation error due to a slow error convergence rate. To overcome these limitations, we propose Reasoning-Pruning Perplexity Consistency (RPC). This approach combines Perplexity Consistency, which seamlessly integrates LLM perplexity with self-consistency, and Reasoning Pruning, which eliminates low-probability reasoning paths to effectively prevent the degeneration of estimation error reduction. Theoretical analysis demonstrates that RPC not only accelerates the convergence rate of estimation error to an exponential level but also holds strong potential for further reducing model error. Extensive empirical evaluations on seven benchmark datasets confirm that RPC can significantly improve reasoning performance, sample efficiency, and confidence reliability.
LGDec 14, 2024
Fully Test-time Adaptation for Tabular DataZhi Zhou, Kun-Yang Yu, Lan-Zhe Guo et al.
Tabular data plays a vital role in various real-world scenarios and finds extensive applications. Although recent deep tabular models have shown remarkable success, they still struggle to handle data distribution shifts, leading to performance degradation when testing distributions change. To remedy this, a robust tabular model must adapt to generalize to unknown distributions during testing. In this paper, we investigate the problem of fully test-time adaptation (FTTA) for tabular data, where the model is adapted using only the testing data. We identify three key challenges: the existence of label and covariate distribution shifts, the lack of effective data augmentation, and the sensitivity of adaptation, which render existing FTTA methods ineffective for tabular data. To this end, we propose the Fully Test-time Adaptation for Tabular data, namely FTAT, which enables FTTA methods to robustly optimize the label distribution of predictions, adapt to shifted covariate distributions, and suit a variety of tasks and models effectively. We conduct comprehensive experiments on six benchmark datasets, which are evaluated using three metrics. The experimental results demonstrate that FTAT outperforms state-of-the-art methods by a margin.
LGFeb 18, 2025
A Smooth Transition Between Induction and Deduction: Fast Abductive Learning Based on Probabilistic Symbol PerceptionLin-Han Jia, Si-Yu Han, Lan-Zhe Guo et al.
Abductive learning (ABL) that integrates strengths of machine learning and logical reasoning to improve the learning generalization, has been recently shown effective. However, its efficiency is affected by the transition between numerical induction and symbolical deduction, leading to high computational costs in the worst-case scenario. Efforts on this issue remain to be limited. In this paper, we identified three reasons why previous optimization algorithms for ABL were not effective: insufficient utilization of prediction, symbol relationships, and accumulated experience in successful abductive processes, resulting in redundant calculations to the knowledge base. To address these challenges, we introduce an optimization algorithm named as Probabilistic Symbol Perception (PSP), which makes a smooth transition between induction and deduction and keeps the correctness of ABL unchanged. We leverage probability as a bridge and present an efficient data structure, achieving the transfer from a continuous probability sequence to discrete Boolean sequences with low computational complexity. Experiments demonstrate the promising results.
CLSep 26, 2025
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning TheoryXiao-Wen Yang, Zihao Zhang, Jianuo Cao et al.
Large language models (LLMs) have recently demonstrated remarkable progress in formal theorem proving. Yet their ability to serve as practical assistants for mathematicians, filling in missing steps within complex proofs, remains underexplored. We identify this challenge as the task of subgoal completion, where an LLM must discharge short but nontrivial proof obligations left unresolved in a human-provided sketch. To study this problem, we introduce FormalML, a Lean 4 benchmark built from foundational theories of machine learning. Using a translation tactic that converts procedural proofs into declarative form, we extract 4937 problems spanning optimization and probability inequalities, with varying levels of difficulty. FormalML is the first subgoal completion benchmark to combine premise retrieval and complex research-level contexts. Evaluation of state-of-the-art provers highlights persistent limitations in accuracy and efficiency, underscoring the need for more capable LLM-based theorem provers for effective subgoal completion,