LGFeb 13, 2023
Diffusion Models in Bioinformatics: A New Wave of Deep Learning Revolution in ActionZhiye Guo, Jian Liu, Yanli Wang et al.
Denoising diffusion models have emerged as one of the most powerful generative models in recent years. They have achieved remarkable success in many fields, such as computer vision, natural language processing (NLP), and bioinformatics. Although there are a few excellent reviews on diffusion models and their applications in computer vision and NLP, there is a lack of an overview of their applications in bioinformatics. This review aims to provide a rather thorough overview of the applications of diffusion models in bioinformatics to aid their further development in bioinformatics and computational biology. We start with an introduction of the key concepts and theoretical foundations of three cornerstone diffusion modeling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks, and stochastic differential equations), followed by a comprehensive description of diffusion models employed in the different domains of bioinformatics, including cryo-EM data enhancement, single-cell data analysis, protein design and generation, drug and small molecule design, and protein-ligand interaction. The review is concluded with a summary of the potential new development and applications of diffusion models in bioinformatics.
SPFeb 24, 2019
Numerical Simulation of Microflows using Hermite Spectral MethodsZhicheng Hu, Zhenning Cai, Yanli Wang
We propose a Hermite spectral method for the spatially inhomogeneous Boltzmann equation. For the inverse-power-law model, we generalize an approximate quadratic collision operator defined in the normalized and dimensionless setting to an operator for arbitrary distribution functions. An efficient algorithm with a fast transform is introduced to discretize this new collision operator. The method is tested for one-dimensional benchmark microflow problems.
NANov 19, 2018
Filtered Hyperbolic Moment Method for the Vlasov EquationYana Di, Yuwei Fan, Zhenzhong Kou et al.
In this paper, we investigate the effect of the filter for the hyperbolic moment equations(HME) [15] of the Vlasov-Poisson equations and propose a novel quasi time-consistent filter to suppress the numerical recurrence effect. By taking properties of HME into consideration, the filter preserves a lot of physical properties of HME, including Galilean invariance and the conservation of mass, momentum and energy. We present two viewpoints, collisional viewpoint and dissipative viewpoint, to dissect the filter, and show that the filtered hyperbolic moment method can be treated as a solver of Vlasov equation. Numerical simulations of the linear Landau damping and two stream instability are tested to demonstrate the effectiveness of the filter in restraining recurrence arising from particle streaming. Both the analysis and the numerical results indicate that the filtered HME can capture the evolution of the Vlasov equation, even when phase mixing and filamentation are dominant.
NASep 24, 2017
Suppression of Recurrence in the Hermite-Spectral Method for Transport EquationsZhenning Cai, Yanli Wang
We study the unphysical recurrence phenomenon arising in the numerical simulation of the transport equations using Hermite-spectral method. From a mathematical point of view, the suppression of this numerical artifact with filters is theoretically analyzed for two types of transport equations. It is rigorously proven that all the non-constant modes are damped exponentially by the filters in both models, and formally shown that the filter does not affect the damping rate of the electric energy in the linear Landau damping problem. Numerical tests are performed to show the effect of the filters.
NAFeb 3, 2019
A Robust Riemann Solver for Multiple Hydro-Elastoplastic Solid MediumsRuo Li, Yanli Wang, Chengbao Yao
We propose a robust approximate solver for the hydro-elastoplastic solid material, a general constitutive law extensively applied in explosion and high speed impact dynamics, and provide a natural transformation between the fluid and solid in the case of phase transitions. The hydrostatic components of the solid is described by a family of general Mie-Grüneisen equation of state (EOS), while the deviatoric component includes the elastic phase, linearly hardened plastic phase and fluid phase. The approximate solver provides the interface stress and normal velocity by an iterative method. The well-posedness and convergence of our solver are proved with mild assumptions on the equations of state. The proposed solver is applied in computing the numerical flux at the phase interface for our compressible multi-medium flow simulation on Eulerian girds. Several numerical examples, including Riemann problems, shock-bubble interactions, implosions and high speed impact applications, are presented to validate the approximate solver.
NADec 18, 2018
Approximation to Singular Quadratic Collision Model in Fokker-Planck-Landau EquationRuo Li, Yanli Wang, Yixuan Wang
We propose a Hermite-Galerkin spectral method to numerically solve the spatially homogeneous Fokker-Planck-Landau equation with singular quadratic collision model. To compute the collision model, we adopt a novel approximation formulated by a combination of a simple linear term and a quadratic term very expensive to evaluate. Using the Hermite expansion, the quadratic term is evaluated exactly by calculating the spectral coefficients. To deal with singularities, we make use of Burnett polynomials so that even very singular collision model can be handled smoothly. Numerical examples demonstrate that our method can capture low-order moments with satisfactory accuracy and performance.
NAJan 26, 2016
Preserving Hyperbolicity in Stochastic Galerkin Method for Uncertainty QuantificationZhenning Cai, Ruo Li, Yanli Wang
We first investigate the structure of the systems derived from the gPC based stochastic Galerkin method for the nonlinear hyperbolic systems with random inputs. This method adopts a generalized Polynomial Chaos (gPC) approximations in the stochastic Galerkin framework, but such approximations to the nonlinear hyperbolic systems do not necessarily yield hyperbolic systems \cite{Lucor2013}. Thus based on the work in \cite{framework}, we propose a framework to carry out the model reduction for the general nonlinear hyperbolic system to derive a final global system. Within this framework, the nonlinear hyperbolic system in one space dimension and the symmetric hyperbolic system in multiple space dimensions are reduced into a symmetric hyperbolic system based on the stochastic Galerkin method. We note that the basis functions in the expansion are not restricted to the random-dependent polynomials as that in gPC method and there is no restriction on the dimensions of the random variables neither.
ITMay 22
MDS and NMDS Codes from the Extended Twisted Generalized Reed-Solomon CodesYanli Wang, Yanxin Chen, Tongjiang Yan
This paper contributes to maximum distance separable (MDS) and near MDS (NMDS) properties of the extended generalized twisted Reed-Solomon (TGRS) codes. Firstly, a family of extended TGRS (ETGRS) are constructed by appending three columns to the generator matrix of original TGRS codes. Secondly, the necessary and sufficient conditions for these codes to be MDS or almost MDS (AMDS) codes are derived. Then, by analyzing the AMDS properties of their dual codes, the necessary and sufffcient conditions for them to be NMDS codes are established. Furthermore, some examples are given to verify the main results. Finally, we determine the non-generalized Reed-Solomon (non-GRS) characteristics of them via the Schur product method.
ITMay 22
Self-Orthogonal Twisted Generalized Reed-Solomon Codes and Their Application to Quantum Error-Correcting CodesYanxin Chen, Yanli Wang, Tongjiang Yan
In this paper, two classes of twisted generalized Reed-Solomon (TGRS) codes with multi-twists are studied. Firstly, some sufficient and necessary conditions for these codes to be self-orthogonal and self-dual are established. Then several explicit constructions of self-orthogonal and self-dual codes are presented, from which quantum stabilizer codes are further derived. Finally, some corresponding examples are given, especially that some of these codes are MDS, AMDS or NMDS and that some of the resulting quantum stabilizer codes are optimal, achieving the quantum Singleton bound.
SEJan 27
AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code CompletionTianyue Jiang, Yanli Wang, Yanlin Wang et al.
Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation (RAG) approaches have shown promise by retrieving relevant code snippets as cross-file context, they suffer from two fundamental problems: misalignment between the query and the target code in the retrieval process, and the inability of existing retrieval methods to effectively utilize the inference information. To address these challenges, we propose AlignCoder, a repository-level code completion framework that introduces a query enhancement mechanism and a reinforcement learning based retriever training method. Our approach generates multiple candidate completions to construct an enhanced query that bridges the semantic gap between the initial query and the target code. Additionally, we employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval. We evaluate AlignCoder on two widely-used benchmarks (CrossCodeEval and RepoEval) across five backbone code LLMs, demonstrating an 18.1% improvement in EM score compared to baselines on the CrossCodeEval benchmark. The results show that our framework achieves superior performance and exhibits high generalizability across various code LLMs and programming languages.
NAMay 18
Solving Vlasov-Poisson system with an adaptive Hermite spectral methodSihong Shao, Yanli Wang, Jie Wu
We propose an adaptive Hermite spectral method for the Vlasov-Poisson system based on a recently developed frequency indicator that measures the contribution of the high-order expansion coefficients. Precisely, the symmetrically weighted Hermite basis with a scaling factor is utilized to approximate the distribution function to satisfy the increasing resolution requirement, which, for example, is induced by filamentation. To implement the scaling adjustment, a fast conservative projection operator is constructed in two steps. The first step is to formulate the projection as a constrained optimization problem to preserve key invariants, including mass, momentum, energy, and the $L^2$ norm of the distribution function. The second step is an ODE-based approximation developed to compute the updated expansion coefficients with linear complexity. Numerical experiments with 1D1V and 2D2V settings validate the feasibility and efficiency of this proposed adaptive Hermite method.
CLApr 17
Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal RepresentationsYanli Wang, Peng Kuang, Xiaoyu Han et al.
Large language models are increasingly deployed in settings where reliability matters, yet output-level uncertainty signals such as token probabilities, entropy, and self-consistency can become brittle under calibration--deployment mismatch. Conformal prediction provides finite-sample validity under exchangeability, but its practical usefulness depends on the quality of the nonconformity score. We propose a conformal framework for LLM question answering that uses internal representations rather than output-facing statistics: specifically, we introduce Layer-Wise Information (LI) scores, which measure how conditioning on the input reshapes predictive entropy across model depth, and use them as nonconformity scores within a standard split conformal pipeline. Across closed-ended and open-domain QA benchmarks, with the clearest gains under cross-domain shift, our method achieves a better validity--efficiency trade-off than strong text-level baselines while maintaining competitive in-domain reliability at the same nominal risk level. These results suggest that internal representations can provide more informative conformal scores when surface-level uncertainty is unstable under distribution shift.
NAMar 14
A bi-fidelity method for the uncertain Vlasov-Poisson system near quasineutrality in an asymptotic-preserving particle-in-cell frameworkGuangwei Liu, Liu Liu, Yanli Wang
In this paper, we study the Vlasov-Poisson system with massless electrons (VPME) near quasineutrality and with uncertainties. Based on the idea of reformulation on the Poisson equation by [P. Degond et.al., $\textit{Journal of Computational Physics}$, 229 (16), 2010, pp. 5630--5652], we first consider the deterministic problem and develop an efficient asymptotic-preserving particle-in-cell (AP-PIC) method to capture the quasineutral limit numerically, without resolving the discretizations subject to the small Debye length in plasma. The main challenge and difference compared to previous related works is that we consider the nonlinear Poisson in the VPME system which contains $e^Ï$ (with $Ï$ being the electric potential) and provide an explicit scheme. In the second part, we extend to study the uncertainty quantification (UQ) problem and develop an efficient bi-fidelity method for solving the VPME system with multidimensional random parameters, by choosing the Euler-Poisson equation as the low-fidelity model. Several numerical experiments are shown to demonstrate the asymptotic-preserving property of our deterministic solver and the effectiveness of our bi-fidelity method for solving the model with random uncertainties.
SEDec 23, 2024
RepoTransBench: A Real-World Benchmark for Repository-Level Code TranslationYanli Wang, Yanlin Wang, Suiquan Wang et al.
Repository-level code translation refers to translating an entire code repository from one programming language to another while preserving the functionality of the source repository. Many benchmarks have been proposed to evaluate the performance of such code translators. However, previous benchmarks mostly provide fine-grained samples, focusing at either code snippet, function, or file-level code translation. Such benchmarks do not accurately reflect real-world demands, where entire repositories often need to be translated, involving longer code length and more complex functionalities. To address this gap, we propose a new benchmark, named RepoTransBench, which is a real-world repository-level code translation benchmark with an automatically executable test suite. We conduct experiments on RepoTransBench to evaluate the translation performance of 11 advanced LLMs. We find that the Success@1 score (test success in one attempt) of the best-performing LLM is only 7.33%. To further explore the potential of LLMs for repository-level code translation, we provide LLMs with error-related feedback to perform iterative debugging and observe an average 7.09% improvement on Success@1. However, even with this improvement, the Success@1 score of the best-performing LLM is only 21%, which may not meet the need for reliable automatic repository-level code translation. Finally, we conduct a detailed error analysis and highlight current LLMs' deficiencies in repository-level code translation, which could provide a reference for further improvements.
SEApr 11, 2025
Towards an Understanding of Context Utilization in Code IntelligenceYanlin Wang, Kefeng Duan, Dewu Zheng et al.
Code intelligence is an emerging domain in software engineering, aiming to improve the effectiveness and efficiency of various code-related tasks. Recent research suggests that incorporating contextual information beyond the basic original task inputs (i.e., source code) can substantially enhance model performance. Such contextual signals may be obtained directly or indirectly from sources such as API documentation or intermediate representations like abstract syntax trees can significantly improve the effectiveness of code intelligence. Despite growing academic interest, there is a lack of systematic analysis of context in code intelligence. To address this gap, we conduct an extensive literature review of 146 relevant studies published between September 2007 and August 2024. Our investigation yields four main contributions. (1) A quantitative analysis of the research landscape, including publication trends, venues, and the explored domains; (2) A novel taxonomy of context types used in code intelligence; (3) A task-oriented analysis investigating context integration strategies across diverse code intelligence tasks; (4) A critical evaluation of evaluation methodologies for context-aware methods. Based on these findings, we identify fundamental challenges in context utilization in current code intelligence systems and propose a research roadmap that outlines key opportunities for future research.
CLOct 15, 2025
Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time ScalingPeng Kuang, Yanli Wang, Xiaoyu Han et al.
Process reward models (PRMs) are a cornerstone of test-time scaling (TTS), designed to verify and select the best responses from large language models (LLMs). However, this promise is challenged by recent benchmarks where simple majority voting, which ignores PRM signals, occasionally outperforms standard PRM-based selection. This raises a critical question: How can we effectively utilize verification signals from PRMs for TTS? To address this, we start by developing a theoretical framework for optimally combining signals from both the LLM and the PRM. Our framework reveals that the optimal strategy is a weighted aggregation of responses, a strategy whose effectiveness hinges on estimating weights that capture the complex interplay between the models. Based on our theoretical results, we empirically show that these optimal weighting functions differ significantly across LLM-PRM pairs and, notably, often assign substantial negative weights. Motivated by these insights, we propose efficient pre-computation methods to calibrate these weighting functions. Extensive experiments across 5 LLMs and 7 PRMs demonstrate that our calibration method significantly boosts the TTS efficiency, surpassing the performance of vanilla weighted majority voting while using only $21.3\%$ of the computation. Ultimately, our work demonstrates that investing in a more intelligent aggregation strategy can be a more convincing path to performance gains than simply scaling test-time computation.
MEOct 21, 2017
Heat Kernel Smoothing in Irregular Image DomainsMoo K. Chung, Yanli Wang, Gurong Wu
We present the discrete version of heat kernel smoothing on graph data structure. The method is used to smooth data in an irregularly shaped domains in 3D images. New statistical properties are derived. As an application, we show how to filter out data in the lung blood vessel trees obtained from computed tomography. The method can be further used in representing the complex vessel trees parametrically and extracting the skeleton representation of the trees.