Haocheng Luo

LG
h-index29
9papers
248citations
Novelty51%
AI Score55

9 Papers

96.0AIJun 4Code
Learning Visual Spatial Planning from Symbolic State via Modality-Gap-Aware Self-Distillation

Haocheng Luo, Jiahui Liu, Ruicheng Zhang et al.

While vision-language models excel at general multimodal understanding, they still struggle with visual spatial planning. We attribute this to a perception-reasoning modality gap: visual planning requires models to infer latent state structures from pixels and then reason over the recovered structure to produce valid actions, whereas symbolic planning directly leverages explicit objects and constraints. This creates dual bottlenecks in visual state recovery and multi-step planning. To address this, we propose MGSD, a two-stage modality-gap-aware self-distillation framework. First, a cold-start grounding stage equips the visual student with reliable state representations, minimizing early perception noise. Second, a privileged teacher transfers planning capabilities via on-policy distillation, using explicit symbolic states to supervise the student's own visual rollout prefixes. Crucially, symbolic data is used strictly during training, leaving inference purely visual. Experiments on visual planning benchmarks show that MGSD consistently improves visual planning across both 4B and 8B backbones, raising the macro average by 19.3% and 18.4%, respectively. The resulting models narrow the gap to symbolic-input upper bounds, while ablations and diagnostics confirm that the improvement comes from both visual state recovery and optimal-path reasoning. These results suggest that modality-gap-aware self-distillation improves not only how models perceive actionable states, but also how they plan over the inferred structure. Code is available at https://github.com/Oranger-l/MGSD.

SYDec 4, 2017
Coordinated Charging and Discharging Strategies for Plug-in Electric Bus Fast Charging Station with Energy Storage System

Huimiao Chen, Zechun Hu, Hongcai Zhang et al. · tsinghua

Plug-in electric bus (PEB) is an environmentally friendly mode of public transportation and plug-in electric bus fast charging stations (PEBFCSs) play an essential role in the operation of PEBs. Under effective control, deploying an energy storage system (ESS) within a PEBFCS can reduce the peak charging loads and the electricity purchase costs. To deal with the (integrated) scheduling problem of (PEBs charging and) ESS charging and discharging, in this study, we propose an optimal real-time coordinated charging and discharging strategy for a PEBFCS with ESS to achieve maximum economic benefits. According to whether the PEB charging loads are controllable, the corresponding mathematical models are respectively established under two scenarios, i.e., coordinated PEB charging scenario and uncoordinated PEB charging scenario. The price and lifespan of ESS, the capacity charge of PEBFCS and the electricity price arbitrage are considered in the models. Further, under the coordinated PEB charging scenario, a heuristics-based method is developed to get the approximately optimal strategy with computation efficiency dramatically enhanced. Finally, we validate the effectiveness of the proposed strategies, interpret the effect of ESS prices on the usage of ESS, and provide the sensitivity analysis of ESS capacity through the case studies.

SYDec 19, 2017
Plug-in Electric Vehicle Charging Congestion Analysis Using Taxi Travel Data in the Central Area of Beijing

Huimiao Chen, Hongcai Zhang, Zechun Hu et al. · tsinghua

Recharging a plug-in electric vehicle is more time-consuming than refueling an internal combustion engine vehicle. As a result, charging stations may face serious congestion problems during peak traffic hours in the near future with the rapid growth of plug-in electric vehicle population. Considering that drivers' time costs are usually expensive, charging congestion will be a dominant factor that affect a charging station's quality of service. Hence, it is indispensable to conduct adequate congestion analysis when designing charging stations in order to guarantee acceptable quality of service in the future. This paper proposes a data-driven approach for charging congestion analysis of plug-in electric vehicle charging stations. Based on a data-driven plug-in electric vehicle charging station planning model, we adopt the queuing theory to model and analyze the charging congestion phenomenon in these planning results. We simulate and analyze the proposed method for charging stations servicing shared-use electric taxis in the central area of Beijing leveraging real-world taxi travel data.

96.2LGMar 18Code
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization

Haocheng Luo, Zehang Deng, Thanh-Toan Do et al.

Direct Preference Optimization (DPO) has emerged as a popular algorithm for aligning pretrained large language models with human preferences, owing to its simplicity and training stability. However, DPO suffers from the recently identified squeezing effect (also known as likelihood displacement), where the probability of preferred responses decreases unintentionally during training. To understand and mitigate this phenomenon, we develop a theoretical framework that models the coordinate-wise dynamics in logit space. Our analysis reveals that negative-gradient updates cause residuals to expand rapidly along high-curvature directions, which underlies the squeezing effect, whereas Sharpness-Aware Minimization (SAM) can suppress this behavior through its curvature-regularization effect. Building on this insight, we investigate logits-SAM, a computationally efficient variant that perturbs only the output layer with negligible overhead. Extensive experiments on Pythia-2.8B, Mistral-7B, and Gemma-2B-IT across multiple datasets and benchmarks demonstrate that logits-SAM consistently improves the effectiveness of DPO and integrates seamlessly with other DPO variants. Code is available at https://github.com/RitianLuo/logits-sam-dpo.

LGSep 29, 2023
Sharpness-Aware Teleportation on Riemannian Manifolds

Tuan Truong, Hoang-Phi Nguyen, Haocheng Luo et al.

Recent studies highlight the effectiveness of flat minima in enhancing generalization, with sharpness-aware minimization (SAM) achieving state-of-the-art performance. Additionally, insights into the intrinsic geometry of the loss landscape have shown promise for improving model generalization. Building on these advancements, we introduce a novel sharpness-aware, geometry-aware teleportation mechanism to further enhance robustness and generalization. The core innovation of our approach is to decompose each iteration into a teleportation step within a local orbit and a sharpness-aware step that transitions between different orbits, leveraging the Riemannian quotient manifold. Our approach is grounded in a theoretical framework that analyzes the generalization gap between population loss and worst-case empirical loss within the context of Riemannian manifolds. To demonstrate the effectiveness of our method, we evaluate and compare our algorithm on diverse vision benchmarks with various datasets and Riemannian manifolds.

RODec 7, 2025
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

Ruicheng Zhang, Mingyang Zhang, Jun Zhou et al. · tsinghua

Embodied imitation learning is constrained by the scarcity of diverse, long-horizon robotic manipulation data. Existing video generation models for this domain are limited to synthesizing short clips of simple actions and often rely on manually defined trajectories. To this end, we introduce MIND-V, a hierarchical framework designed to synthesize physically plausible and logically coherent videos of long-horizon robotic manipulation. Inspired by cognitive science, MIND-V bridges high-level reasoning with pixel-level synthesis through three core components: a Semantic Reasoning Hub (SRH) that leverages a pre-trained vision-language model for task planning; a Behavioral Semantic Bridge (BSB) that translates abstract instructions into domain-invariant representations; and a Motor Video Generator (MVG) for conditional video rendering. MIND-V employs Staged Visual Future Rollouts, a test-time optimization strategy to enhance long-horizon robustness. To align the generated videos with physical laws, we introduce a GRPO reinforcement learning post-training phase guided by a novel Physical Foresight Coherence (PFC) reward. PFC leverages the V-JEPA world model to enforce physical plausibility by aligning the predicted and actual dynamic evolutions in the feature space. MIND-V demonstrates state-of-the-art performance in long-horizon robotic manipulation video generation, establishing a scalable and controllable paradigm for embodied data synthesis.

LGJan 22, 2025Code
Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization

Haocheng Luo, Tuan Truong, Tung Pham et al.

Sharpness-Aware Minimization (SAM) has attracted significant attention for its effectiveness in improving generalization across various tasks. However, its underlying principles remain poorly understood. In this work, we analyze SAM's training dynamics using the maximum eigenvalue of the Hessian as a measure of sharpness, and propose a third-order stochastic differential equation (SDE), which reveals that the dynamics are driven by a complex mixture of second- and third-order terms. We show that alignment between the perturbation vector and the top eigenvector is crucial for SAM's effectiveness in regularizing sharpness, but find that this alignment is often inadequate in practice, limiting SAM's efficiency. Building on these insights, we introduce Eigen-SAM, an algorithm that explicitly aims to regularize the top Hessian eigenvalue by aligning the perturbation vector with the leading eigenvector. We validate the effectiveness of our theory and the practical advantages of our proposed approach through comprehensive experiments. Code is available at https://github.com/RitianLuo/EigenSAM.

CLNov 2, 2023
Re-weighting Tokens: A Simple and Effective Active Learning Strategy for Named Entity Recognition

Haocheng Luo, Wei Tan, Ngoc Dang Nguyen et al.

Active learning, a widely adopted technique for enhancing machine learning models in text and image classification tasks with limited annotation resources, has received relatively little attention in the domain of Named Entity Recognition (NER). The challenge of data imbalance in NER has hindered the effectiveness of active learning, as sequence labellers lack sufficient learning signals. To address these challenges, this paper presents a novel reweighting-based active learning strategy that assigns dynamic smoothed weights to individual tokens. This adaptable strategy is compatible with various token-level acquisition functions and contributes to the development of robust active learners. Experimental results on multiple corpora demonstrate the substantial performance improvement achieved by incorporating our re-weighting strategy into existing acquisition functions, validating its practical efficacy.

LGSep 22, 2025
Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise

Haocheng Luo, Mehrtash Harandi, Dinh Phung et al.

Sharpness-aware minimization (SAM) has emerged as a highly effective technique for improving model generalization, but its underlying principles are not fully understood. We investigated the phenomenon known as m-sharpness, where the performance of SAM improves monotonically as the micro-batch size for computing perturbations decreases. In practice, the empirical m-sharpness effect underpins the deployment of SAM in distributed training, yet a rigorous theoretical account has remained lacking. To provide a theoretical explanation for m-sharpness, we leverage an extended Stochastic Differential Equation (SDE) framework and analyze the structure of stochastic gradient noise (SGN) to characterize the dynamics of various SAM variants, including n-SAM and m-SAM. Our findings reveal that the stochastic noise introduced during SAM perturbations inherently induces a variance-based sharpness regularization effect. Motivated by our theoretical insights, we introduce Reweighted SAM (RW-SAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable. Comprehensive experiments validate the effectiveness of our theoretical analysis and proposed method.