Hua Geng

SY
h-index2
7papers
3citations
Novelty53%
AI Score51

7 Papers

91.5CVJun 2
Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching

Hao Zhong, Muzhi Zhu, Shenyan Zeng et al.

Wide-baseline matching (WBM) requires integrating geometric understanding, viewpoint changes, fine-grained perception, and occlusion reasoning, making it a challenging testbed for spatial reasoning in multimodal large language models (MLLMs) deployed in physical environments. However, current MLLMs lack systematic evaluation and training frameworks for these capabilities. We introduce ReasonMatch-Bench, a benchmark stratified by viewpoint displacement and matching granularity across indoor, outdoor, and object-centric scenarios, and show that current MLLMs still struggle with fine-grained wide-baseline correspondence: on a difficult 90-sample subset, human annotators achieve 84.0 F1, while the best existing baseline reaches 37.2. To bridge this gap, we build a scalable data-generation pipeline that automatically extracts wide-baseline view pairs from large-scale video-3D corpora, including RGB-D videos and SfM reconstructions, yielding diverse and verifiable supervision. We further propose Dynamic Correspondence Reinforcement Learning (DCRL), which combines Image-Level Viewpoint Progression and Point-Level Correspondence Curriculum to improve WBM training through verifiable rewards without explicit CoT supervision. Extensive experiments show that DCRL substantially improves ReasonMatch-Bench and transfers to related spatial benchmarks, while maintaining general visual understanding performance with modest gains on several benchmarks.

56.9SYMay 21
Equilibrium-Free Contraction Stability Analysis for Grid-Forming Converter-Based Microgrids

Shijie Peng, Xiuqiang He, Xi Ru et al.

Renewable-driven microgrids dominated by grid-forming (GFM) converters are subject to persistent power fluctuations, making equilibrium-known stability assessments restrictive. This paper develops an equilibrium-free contraction stability method based on semi-contraction theory. By formulating the system in a symmetry-aware projected state space, the intrinsic rotational mode induced by uniform angle shifts is removed. A blockwise Jacobian decomposition is introduced to characterize the coupled active and reactive power dynamics, yielding a computable regional contraction condition. This condition is then converted into forward-invariant stability certificates that provide trajectory-level performance guarantees. For autonomous operation without disturbances, the method provides an equilibrium-free nonlinear stability characterization together with an estimation of the region of attraction (ROA). For non-autonomous operation under disturbances, it derives explicit bounds for quasi-steady tracking under slowly varying injections and for robustness under fast or composite disturbances. Case studies on a 9-bus system validate the proposed method.

48.1SYApr 23
Frequency Security Assessment in Power Systems With High Penetration of Renewables Considering Spatio-Temporal Frequency Distribution

Changjun He, Hua Geng, Xiuqiang He et al.

The increasing integration of renewable energy sources exacerbates the spatial and temporal differences in frequency across the power system, posing a serious challenge to the accurate and efficient assessment of system frequency security. To address this issue, a generic effective nodal frequency (ENF) model is first established to concisely characterize nodal frequency dynamics. This model is featured by the effective nodal inertia (ENI), damping, and primary regulation parameters, which retain only the dominant constant component governing nodal frequency dynamic performance. This model enables the tractable analytical formulation of nodal frequency trajectory and the key frequency security indicators. Quantitative analysis under the temporary power disturbance condition reveals that the ENI is the most influential parameter governing frequency security. Consequently, the critical nodal inertia for ensuring nodal frequency security is analytically derived. A system-level frequency security index based on the actual ENI and critical nodal inertia is proposed. On the basis of the proposed index, the system frequency security assessment is carried out with the procedure of ``offline calculation and online evaluation'', which is achieved using a lookup table approach and an interpolation method. Simulations on the modified IEEE 39-bus system verify the effectiveness of the proposed assessment method.

58.4SYMay 13
Decentralized Frequency-Domain Conditions for D-Stability with Application to DC Microgrids

Zelin Sun, Shanshan Jiang, Xiaoyu Peng et al.

This paper proposes a decentralized method for regional pole placement, or $\mathcal{D}$-stability, in linearized networked systems. Existing LMI-based methods are hindered by confidentiality concerns regarding proprietary subsystem models and the absence of communication infrastructures. To overcome these barriers, we map the target region $\mathcal{D}$ of pole placement to an auxiliary left-half plane and introduce positive functions to handle the resulting complex-coefficient dynamics. We prove that $\mathcal{D}$-stability is guaranteed via local frequency-domain criteria without requiring shared subsystem models or inter-subsystem communication. This method is then tailored to DC microgrids, where a loop transformation is utilized to reallocate the burden of stability certification, deriving a broadcastable grid code for decentralized parameter synthesis. Numerical examples verify the efficacy of the proposed method.

48.6LGMar 24
Safe Reinforcement Learning with Preference-based Constraint Inference

Chenglin Li, Guangchun Ruan, Hua Geng

Safe reinforcement learning (RL) is a standard paradigm for safety-critical decision making. However, real-world safety constraints can be complex, subjective, and even hard to explicitly specify. Existing works on constraint inference rely on restrictive assumptions or extensive expert demonstrations, which is not realistic in many real-world applications. How to cheaply and reliably learn these constraints is the major challenge we focus on in this study. While inferring constraints from human preferences offers a data-efficient alternative, we identify the popular Bradley-Terry (BT) models fail to capture the asymmetric, heavy-tailed nature of safety costs, resulting in risk underestimation. It is still rare in the literature to understand the impacts of BT models on the downstream policy learning. To address the above knowledge gaps, we propose a novel approach namely Preference-based Constrained Reinforcement Learning (PbCRL). We introduce a novel dead zone mechanism into preference modeling and theoretically prove that it encourages heavy-tailed cost distributions, thereby achieving better constraint alignment. Additionally, we incorporate a Signal-to-Noise Ratio (SNR) loss to encourage exploration by cost variances, which is found to benefit policy learning. Further, two-stage training strategy are deployed to lower online labeling burdens while adaptively enhancing constraint satisfaction. Empirical results demonstrate that PbCRL achieves superior alignment with true safety requirements and outperforms the state-of-the-art baselines in terms of safety and reward. Our work explores a promising and effective way for constraint inference in Safe RL, which has great potential in a range of safety-critical applications.

79.0CVApr 23
Unlocking the Power of Critical Factors for 3D Visual Geometry Estimation

Guangkai Xu, Hua Geng, Huanyi Zheng et al.

Feed-forward visual geometry estimation has recently made rapid progress. However, an important gap remains: multi-frame models usually produce better cross-frame consistency, yet they often underperform strong per-frame methods on single-frame accuracy. This observation motivates our systematic investigation into the critical factors driving model performance through rigorous ablation studies, which reveals several key insights: 1) Scaling up data diversity and quality unlocks further performance gains even in state-of-the-art visual geometry estimation methods; 2) Commonly adopted confidence-aware loss and gradient-based loss mechanisms may unintentionally hinder performance; 3) Joint supervision through both per-sequence and per-frame alignment improves results, while local region alignment surprisingly degrades performance. Furthermore, we introduce two enhancements to integrate the advantages of optimization-based methods and high-resolution inputs: a consistency loss function that enforces alignment between depth maps, camera parameters, and point maps, and an efficient architectural design that leverages high-resolution information. We integrate these designs into CARVE, a resolution-enhanced model for feed-forward visual geometry estimation. Experiments on point cloud reconstruction, video depth estimation, and camera pose/intrinsic estimation show that CARVE achieves strong and robust performance across diverse benchmarks.

LGDec 17, 2024
Tilted Quantile Gradient Updates for Quantile-Constrained Reinforcement Learning

Chenglin Li, Guangchun Ruan, Hua Geng

Safe reinforcement learning (RL) is a popular and versatile paradigm to learn reward-maximizing policies with safety guarantees. Previous works tend to express the safety constraints in an expectation form due to the ease of implementation, but this turns out to be ineffective in maintaining safety constraints with high probability. To this end, we move to the quantile-constrained RL that enables a higher level of safety without any expectation-form approximations. We directly estimate the quantile gradients through sampling and provide the theoretical proofs of convergence. Then a tilted update strategy for quantile gradients is implemented to compensate the asymmetric distributional density, with a direct benefit of return performance. Experiments demonstrate that the proposed model fully meets safety requirements (quantile constraints) while outperforming the state-of-the-art benchmarks with higher return.