APJan 4, 2017
Min-max formulas and other properties of certain classes of nonconvex effective HamiltoniansJianliang Qian, Hung V. Tran, Yifeng Yu
This paper is the first attempt to systematically study properties of the effective Hamiltonian $\overline{H}$ arising in the periodic homogenization of some coercive but nonconvex Hamilton-Jacobi equations. Firstly, we introduce a new and robust decomposition method to obtain min-max formulas for a class of nonconvex $\overline{H}$. Secondly, we analytically and numerically investigate other related interesting phenomena, such as "quasi-convexification" and breakdown of symmetry, of $\overline{H}$ from other typical nonconvex Hamiltonians. Finally, in the appendix, we show that our new method and those a priori formulas from the periodic setting can be used to obtain stochastic homogenization for same class of nonconvex Hamilton-Jacobi equations. Some conjectures and problems are also proposed.
NAFeb 28, 2012
A Numerical Study of Turbulent Flame Speeds of Curvature and Strain G-equations in Cellular FlowsYu-Yu Liu, Jack Xin, Yifeng Yu
We study front speeds of curvature and strain G-equations arising in turbulent combustion. These G-equations are Hamilton-Jacobi type level set partial differential equations (PDEs) with non-coercive Hamiltonians and degenerate nonlinear second order diffusion. The Hamiltonian of strain G-equation is also non-convex. Numerical computation is performed based on monotone discretization and weighted essentially nonoscillatory (WENO) approximation of transformed G-equations on a fixed periodic domain. The advection field in the computation is a two dimensional Hamiltonian flow consisting of a periodic array of counter-rotating vortices, or cellular flows. Depending on whether the evolution is predominantly in the hyperbolic or parabolic regimes, suitable explicit and semi-implicit time stepping methods are chosen. The turbulent flame speeds are computed as the linear growth rates of large time solutions. A new nonlinear parabolic PDE is proposed for the reinitialization of level set functions to prevent piling up of multiple bundles of level sets on the periodic domain. We found that the turbulent flame speed $s_T$ of the curvature G-equation is enhanced as the intensity $A$ of cellular flows increases, at a rate between those of the inviscid and viscous G-equations. The $s_T$ of the strain G-equation increases in small $A$, decreases in larger $A$, then drops down to zero at a large enough but finite value $A_{*}$. The flame front ceases to propagate at this critical intensity $A_*$, and is quenched by the cellular flow.
PSOct 5, 2012
Turbulent Flame Speeds of G-equation Models in Unsteady Cellular FlowsYu-Yu Liu, Jack Xin, Yifeng Yu
We perform a computationl study of front speeds of G-equation models in time dependent cellular flows. The G-equations arise in premixed turbulent combustion, and are Hamilton-Jacobi type level set partial differential equations (PDEs). The curvature-strain G equations are also non-convex with degenerate diffusion. The computation is based on monotone finite difference discretization and weighted essentially nonoscillatory (WENO) methods. We found that the large time front speeds lock into the frequency of time periodic cellular flows in curvature-strain G-equations similar to what occurs in the basic inviscid G-equation. However, such frequency locking phenomenon disappears in viscous G-equation, and in the inviscid G-equation if time periodic oscillation of the cellular flow is replaced by time stochastic oscillation.
LGAug 2, 2024
An Adaptive Tensor-Train Decomposition Approach for Efficient Deep Neural Network CompressionShiyi Luo, Mingshuo Liu, Yifeng Yu et al.
In the field of model compression, choosing an appropriate rank for tensor decomposition is pivotal for balancing model compression rate and efficiency. However, this selection, whether done manually or through optimization-based automatic methods, often increases computational complexity. Manual rank selection lacks efficiency and scalability, often requiring extensive trial-and-error, while optimization-based automatic methods significantly increase the computational burden. To address this, we introduce a novel, automatic, and budget-aware rank selection method for efficient model compression, which employs Layer-Wise Imprinting Quantitation (LWIQ). LWIQ quantifies each layer's significance within a neural network by integrating a proxy classifier. This classifier assesses the layer's impact on overall model performance, allowing for a more informed adjustment of tensor rank. Furthermore, our approach includes a scaling factor to cater to varying computational budget constraints. This budget awareness eliminates the need for repetitive rank recalculations for different budget scenarios. Experimental results on the CIFAR-10 dataset show that our LWIQ improved by 63.2% in rank search efficiency, and the accuracy only dropped by 0.86% with 3.2x less model size on the ResNet-56 model as compared to the state-of-the-art proxy-based automatic tensor rank selection method.
84.1MLMay 13
On the Limits of Latent Reuse in Diffusion ModelsYifeng Yu, Lu Yu
Diffusion models are often trained in low-dimensional latent spaces, which are then reused for related but shifted datasets. In this work, we study when such latent reuse remains reliable under distribution shift. We consider a source-target setting in which both datasets are approximately low-dimensional but may lie near different subspaces. We show that freezing and reusing a source latent space induces a target-domain score error governed by two quantities: the principal-angle misalignment between the source and target subspaces, and the target ambient noise amplified by the diffusion time scale. Motivated by these limits, we further study mixed source-target training and characterize how the required shared latent dimension depends on the relative geometry of the two distributions. Our results provide theoretical guidance on when latent reuse is reliable and when learning a shared representation may be necessary.
SDOct 2, 2025Code
SingMOS-Pro: An Comprehensive Benchmark for Singing Quality AssessmentYuxun Tang, Lan Liu, Wenhao Feng et al.
Singing voice generation progresses rapidly, yet evaluating singing quality remains a critical challenge. Human subjective assessment, typically in the form of listening tests, is costly and time consuming, while existing objective metrics capture only limited perceptual aspects. In this work, we introduce SingMOS-Pro, a dataset for automatic singing quality assessment. Building on our preview version SingMOS, which provides only overall ratings, SingMOS-Pro expands annotations of the additional part to include lyrics, melody, and overall quality, offering broader coverage and greater diversity. The dataset contains 7,981 singing clips generated by 41 models across 12 datasets, spanning from early systems to recent advances. Each clip receives at least five ratings from professional annotators, ensuring reliability and consistency. Furthermore, we explore how to effectively utilize MOS data annotated under different standards and benchmark several widely used evaluation methods from related tasks on SingMOS-Pro, establishing strong baselines and practical references for future research. The dataset can be accessed at https://huggingface.co/datasets/TangRain/SingMOS-Pro.
SEJul 3, 2025Code
RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business ProcessesJiaxing Wang, Yifeng Yu, Jiahan Song et al.
Next activity prediction represents a fundamental challenge for optimizing business processes in service-oriented architectures such as microservices environments, distributed enterprise systems, and cloud-native platforms, which enables proactive resource allocation and dynamic service composition. Despite the prevalence of sequence-based methods, these approaches fail to capture non-sequential relationships that arise from parallel executions and conditional dependencies. Even though graph-based approaches address structural preservation, they suffer from homogeneous representations and static structures that apply uniform modeling strategies regardless of individual process complexity characteristics. To address these limitations, we introduce RLHGNN, a novel framework that transforms event logs into heterogeneous process graphs with three distinct edge types grounded in established process mining theory. Our approach creates four flexible graph structures by selectively combining these edges to accommodate different process complexities, and employs reinforcement learning formulated as a Markov Decision Process to automatically determine the optimal graph structure for each specific process instance. RLHGNN then applies heterogeneous graph convolution with relation-specific aggregation strategies to effectively predict the next activity. This adaptive methodology enables precise modeling of both sequential and non-sequential relationships in service interactions. Comprehensive evaluation on six real-world datasets demonstrates that RLHGNN consistently outperforms state-of-the-art approaches. Furthermore, it maintains an inference latency of approximately 1 ms per prediction, representing a highly practical solution suitable for real-time business process monitoring applications. The source code is available at https://github.com/Joker3993/RLHGNN.
93.4LGMay 9
BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement LearningYuhang Xu, Kaibin Tian, Yang Tian et al.
Reinforcement Learning (RL) has become a cornerstone for improving the performance of Large Language Models (LLMs). However, its rollout phase constitutes a significant efficiency bottleneck, mainly arising from the long-tail bubbles across data parallel ranks, particularly in long-context scenarios where faster GPUs remain idle while waiting for stragglers. Existing solutions, such as partial rollout or asynchronous RL, mitigate these bubbles by compromising the algorithm's strict synchronous nature. Instead, we propose BubbleSpec, a novel framework that accelerates RL rollouts while strictly keeping the mathematical exactness. Instead of attempting to eliminate bubbles, BubbleSpec exploits them. We exploit the idle time windows of faster ranks to pre-generate rollout results for subsequent steps, serving as drafts for speculative decoding. Unlike prior speculative methods that rely on historical epoch similarity and warm-ups, BubbleSpec is agnostic to dataset size and provides immediate acceleration from the onset of training. Extensive evaluations demonstrate that BubbleSpec reduces decoding steps by 50% and increases rollout throughput by up to 1.8x. Critically, BubbleSpec is seamlessly compatible with various RL frameworks and strategies as it sustains the strict synchronous property of RL algorithms.
LGMar 1
Evaluating AI Grading on Real-World Handwritten College Mathematics: A Large-Scale Study Toward a BenchmarkZhiqi Yu, Xingping Liu, Haobin Mao et al.
Grading in large undergraduate STEM courses often yields minimal feedback due to heavy instructional workloads. We present a large-scale empirical study of AI grading on real, handwritten single-variable calculus work from UC Irvine. Using OCR-conditioned large language models with structured, rubric-guided prompting, our system produces scores and formative feedback for thousands of free-response quiz submissions from nearly 800 students. In a setting with no single ground-truth label, we evaluate performance against official teaching-assistant grades, student surveys, and independent human review, finding strong alignment with TA scoring and a large majority of AI-generated feedback rated as correct or acceptable across quizzes. Beyond calculus, this setting highlights core challenges in OCR-conditioned mathematical reasoning and partial-credit assessment. We analyze key failure modes, propose practical rubric- and prompt-design principles, and introduce a multi-perspective evaluation protocol for reliable, real-course deployment. Building on the dataset and evaluation framework developed here, we outline a standardized benchmark for AI grading of handwritten mathematics to support reproducible comparison and future research.
57.0CLMay 4
InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and RepetitionFengze Liu, Weidong Zhou, Binbin Liu et al.
Upweighting high-quality data in LLM pretraining often improves performance, but in datalimited regimes, especially under overtraining, stronger upweighting increases repetition and can degrade performance. However, standard scaling laws do not reliably extrapolate across mixture recipes or under repetitions, making the selection for optimal data recipes at scaling underdetermined. To solve this, we introduce InfoLaw (Information Scaling Laws), a data-aware scaling framework that predicts loss from consumed tokens, model size, data mixture weights, and repetition. The key idea is to model pretraining as information accumulation, where quality controls information density and repetition induces scaledependent diminishing returns. We first collect the model performance after training on datasets that vary in scale, quality distribution, and repetition level. Then we build up the modeling for information so that information accurately predicts those model performance. InfoLaw predicts performance on unseen data recipes and larger scale runs (up to 7B, 425B tokens) with 0.15% mean and 0.96% max absolute error in loss, and it extrapolates reliably across overtraining levels, enabling efficient data-recipe selection under varying compute budgets.
MLFeb 7, 2025
Advancing Wasserstein Convergence Analysis of Score-Based Models: Insights from Discretization and Second-Order AccelerationYifeng Yu, Lu Yu
Score-based diffusion models have emerged as powerful tools in generative modeling, yet their theoretical foundations remain underexplored. In this work, we focus on the Wasserstein convergence analysis of score-based diffusion models. Specifically, we investigate the impact of various discretization schemes, including Euler discretization, exponential integrators, and midpoint randomization methods. Our analysis provides a quantitative comparison of these discrete approximations, emphasizing their influence on convergence behavior. Furthermore, we explore scenarios where Hessian information is available and propose an accelerated sampler based on the local linearization method. We demonstrate that this Hessian-based approach achieves faster convergence rates of order $\widetilde{\mathcal{O}}\left(\frac{1}{\varepsilon}\right)$ significantly improving upon the standard rate $\widetilde{\mathcal{O}}\left(\frac{1}{\varepsilon^2}\right)$ of vanilla diffusion models, where $\varepsilon$ denotes the target accuracy.
CLApr 23, 2025
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM PretrainingFengze Liu, Weidong Zhou, Binbin Liu et al.
Quality and diversity are two critical metrics for the training data of large language models (LLMs), positively impacting performance. Existing studies often optimize these metrics separately, typically by first applying quality filtering and then adjusting data proportions. However, these approaches overlook the inherent trade-off between quality and diversity, necessitating their joint consideration. Given a fixed training quota, it is essential to evaluate both the quality of each data point and its complementary effect on the overall dataset. In this paper, we introduce a unified data selection framework called QuaDMix, which automatically optimizes the data distribution for LLM pretraining while balancing both quality and diversity. Specifically, we first propose multiple criteria to measure data quality and employ domain classification to distinguish data points, thereby measuring overall diversity. QuaDMix then employs a unified parameterized data sampling function that determines the sampling probability of each data point based on these quality and diversity related labels. To accelerate the search for the optimal parameters involved in the QuaDMix framework, we conduct simulated experiments on smaller models and use LightGBM for parameters searching, inspired by the RegMix method. Our experiments across diverse models and datasets demonstrate that QuaDMix achieves an average performance improvement of 7.2% across multiple benchmarks. These results outperform the independent strategies for quality and diversity, highlighting the necessity and ability to balance data quality and diversity.
MLMay 24, 2024
Randomized Midpoint Method for Log-Concave Sampling under ConstraintsYifeng Yu, Lu Yu
In this paper, we study the problem of sampling from log-concave distributions supported on convex, compact sets, with a particular focus on the randomized midpoint discretization of both vanilla and kinetic Langevin diffusions in this constrained setting. We propose a unified proximal framework for handling constraints via a broad class of projection operators, including Euclidean, Bregman, and Gauge projections. Within this framework, we establish non-asymptotic bounds in both $\mathcal{W}_1$ and $\mathcal{W}_2$ distances, providing precise complexity guarantees and performance comparisons. In addition, our analysis leads to sharper convergence guarantees for both vanilla and kinetic Langevin Monte Carlo under constraints, improving upon existing theoretical results.