Runhao Liu

CE
h-index2
7papers
3citations
Novelty41%
AI Score47

7 Papers

82.8LGMay 22
State commitment learning: training language models to distinguish computation from memory

Fei Ding, Yongkang Zhang, Runhao Liu et al.

Reasoning language models do not distinguish tokens used for computation from tokens that constitute persistent state: once generated, all hidden thoughts remain in context and influence future predictions. As a result, downstream reasoning may depend on failed attempts, dead ends, and private scratch work that should not be safely relied on later. We recast this phenomenon as a new training objective, state commitment learning: training models to explicitly distinguish information that should be committed as persistent state from temporary computation that can be discarded. We define a counterfactual criterion, persistent-state sufficiency, which makes it trainable and measurable whether an answer remains usable after hidden thoughts are erased. We then propose Counterfactual Erasure RL (CERL), which evaluates, under the same prefix, both a path that keeps hidden thoughts and a path that erases them, and gives reward only when the erasure path remains correct. We also introduce the Erasure Dependence Protocol and show across mathematics, long-chain logic, scientific QA, and multi-turn tool-use evaluation that CERL substantially reduces answer dependence on hidden thoughts without sacrificing accuracy, consistently outperforming correctness-only RL and long-answer SFT baselines.

35.6CEMay 4
Robust Crop Planning under Uncertainty: Aligning Economic Optimality with Agronomic Sustainability

Runhao Liu, You Li, Zhengyang Cheng et al.

Long-horizon agricultural planning requires optimizing crop allocation under complex spatial heterogeneity, temporal agronomic dependencies, and multi-source environmental uncertainty. Existing approaches often either address crop interactions, such as legume-cereal complementarity, only implicitly or rely on static deterministic formulations that fail to ensure resilience against market and climate volatility.To address these challenges, we propose a Multi-Layer Robust Crop Planning Framework (MLRCPF) that integrates spatial reasoning, temporal dynamics, and robust optimization. Specifically, we formalize crop-to-crop relationships through a structured interaction matrix embedded within the state-transition logic, and employ a distributionally robust optimization layer to mitigate worst-case risks defined by a data-driven ambiguity set. Evaluations on a real-world high-mix farming dataset from North China demonstrate the effectiveness of the proposed approach. The framework autonomously generates sustainable checkerboard rotation patterns that restore soil fertility, significantly increasing the legume planting ratio compared to deterministic baselines. Economically, it successfully resolves the trade-off between optimality and stability. These results highlight the importance of explicitly encoding domain-specific structural priors into optimization models for resilient decision-making in complex agricultural systems.

31.8CEMay 4
From Production Envelopes to Executable Schedules: Sound Constructive Refinement for High-Mix Manufacturing

Runhao Liu, Zhengyang Cheng, Fei Ding et al.

High-mix manufacturing systems require production plans that are both profitable and refinable into executable machine-level schedules under heterogeneous resources, mold-dependent compatibility, setup losses,delivery windows, and accessory synchronization. We study this problem as a production-envelope refinement task. A rolling-horizon mixed-integer linear programming (MILP) planner generates a valid production envelope that fixes daily production, fulfillment, mold states, inventory flows, outsourcing, and unmet-demand variables. A structure-aware constructive scheduler then refines this envelope into concrete order-machine allocations while preserving capacity feasibility, product-mold-machine compatibility, and delivery-window compliance. The scheduler enforces a one-mold-per-machine-per-day stability rule to avoid intra-day mold fragmentation. We establish residual invariants and prove a soundness theorem: whenever refinement terminates with zero residual fulfillment, the returned allocation is executable with respect to the valid envelope. The framework is implemented as an Advanced Planning and Scheduling (APS) prototype and evaluated on a real industrial case from a Jiangsu smartphone-case manufacturer in China with 37 product types, 150 orders, and over 8.3 million requested units. The proposed stable refinement achieves 100% on-time delivery, eliminates outsourcing, and bounds changeover-driven capacity loss to 1.9-4.6%. Across nine demand and changeover perturbation scenarios, it maintains robust delivery performance, showing that sound envelope refinement is a practical mechanism for reliable manufacturing scheduling.

70.8LGApr 19
Rethinking the Comparison Unit in Sequence-Level Reinforcement Learning: An Equal-Length Paired Training Framework from Loss Correction to Sample Construction

Fei Ding, Yongkang Zhang, Runhao Liu et al.

This paper investigates the length problem in sequence-level relative reinforcement learning. We observe that, although existing methods partially alleviate length-related phenomena, a more fundamental issue remains insufficiently characterized: the comparison units used during training lack inherent comparability. Building on this observation, we propose a new perspective: the length problem should not be viewed merely as a loss-scaling or normalization bias, but rather as a \emph{comparison unit construction} problem. We further establish a sample-construction-based training framework that, instead of applying post-hoc corrections to unequal-length responses, proactively constructs equal-length, alignable, and comparable training segments during generation. Within this framework, we propose EqLen, a concrete method applicable to group-relative comparison algorithms such as GRPO, GSPO, and RLOO. Through dual-track synchronous generation, prefix inheritance, and segment masking, EqLen efficiently collects effective equal-length training segments and enables stable

94.5LGApr 19
Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

Fei Ding, Yongkang Zhang, Runhao Liu et al.

The central challenge of reinforcement learning for reasoning lies not only in the sparsity of outcome-level supervision, but more fundamentally in how to transform feedback provided only at the end of a sequence into fine-grained learning signals that can guide intermediate reasoning steps. Existing approaches either rely on outcome-level rewards for sequence-level optimization, which makes precise credit assignment difficult, or depend on externally constructed process supervision, which is costly and difficult to scale sustainably. To address this, we propose a new perspective: reinforcement learning for reasoning can be understood as the problem of internalizing outcome supervision into process supervision. From this perspective, we introduce a supervision-internalization method for reinforcement learning for reasoning, enabling the model to automatically extract process-level learning signals through identifying, correcting, and reusing failed reasoning trajectories, thereby achieving finer-grained policy optimization under outcome-only supervision. We further abstract this idea into a new training paradigm, in which the model continually generates and refines its own internal process supervision during reinforcement learning, opening a new path for fine-grained credit assignment in reinforcement learning for reasoning that differs from externally provided process supervision.

CVOct 4, 2025
Skin Lesion Classification Based on ResNet-50 Enhanced With Adaptive Spatial Feature Fusion

Runhao Liu, Ziming Chen, Peng Zhang

Skin cancer classification remains a challenging problem due to high inter-class similarity, intra-class variability, and image noise in dermoscopic images. To address these issues, we propose an improved ResNet-50 model enhanced with Adaptive Spatial Feature Fusion (ASFF), which adaptively integrates multi-scale semantic and surface features to improve feature representation and reduce overfitting. The ResNet-50 model is enhanced with an adaptive feature fusion mechanism to achieve more effective multi-scale feature extraction and improve overall performance. Specifically, a dual-branch design fuses high-level semantic and mid-level detail features, which are processed through global average pooling and fully connected layers to generate adaptive weights for weighted fusion, thereby strengthening feature learning and reducing the impact of noise on classification. The method is evaluated on a subset of the ISIC 2020 dataset containing 3297 benign and malignant skin lesion images. Experimental results show that the proposed ASFF-based ResNet-50 achieves the best overall performance compared with 5 classic convolutional neural networks (CNNs) models. The proposed model reached an accuracy of 93.18% along with higher precision, recall, specificity, and F1 score. The improved model achieves an AUC value of 0.9670 and 0.9717 in the P-R and ROC curve, respectively. Then, the evaluation based on Grad-CAM further proved that the improved model adaptively focuses on lesion-relevant regions while suppressing irrelevant background information, thereby validating its enhanced feature learning capability from a deep representation perspective. These findings demonstrate that the proposed approach provides a more effective and efficient solution for computer-aided skin cancer diagnosis.

CVOct 4, 2025
Exploring the Challenge and Value of Deep Learning in Automated Skin Disease Diagnosis

Runhao Liu, Ziming Chen, Peng Zhang

Skin cancer is one of the most prevalent and deadly forms of cancer worldwide, which highlights the critical importance of early detection and diagnosis in improving patient outcomes. Deep learning (DL) has shown significant promise in enhancing the accuracy and efficiency of automated skin disease diagnosis, particularly in detecting and evaluating skin lesions and classification. However, there are still several challenges for DL-based skin cancer diagnosis, including complex features, image noise, intra-class variation, inter-class similarity, and data imbalance. By synthesizing recent research, this review discusses innovative approaches to cope with these challenges, such as data augmentation, hybrid models, and feature fusion, etc. Furthermore, the review highlights the integration of DL models into clinical workflows, offering insights into the potential of deep learning to revolutionize skin disease diagnosis and improve clinical decision-making. This article follows a comprehensive methodology based on the PRISMA framework and emphasizes the need for continued advancements to fully unlock the transformative potential of DL in dermatological care.