Yunjie Gu

LG
h-index6
8papers
447citations
Novelty51%
AI Score53

8 Papers

96.3SYMay 7
Consideration of Control-Loop Interaction in Transient Stability of Grid-Following Inverters using Bandwidth Separation Method

Yifan Zhang, Yunjie Gu, Yue Zhu et al.

Grid-following inverters have been widely adopted as a grid interface for renewable energy, and ensuring their small-signal and large-signal stability is critical to modern power systems. Their large-signal, or transient, stability is a significant challenge to analyze because of the interaction of the phase-locked loop (PLL), which must maintain synchronism with various outer-loop controllers. Simple analysis in which outer-loop controllers are idealized is insufficient, and the interactions between the nonlinear dynamics of the PLL and the dynamics of the DC-link voltage control (DVC), as well as the AC terminal voltage control (TVC) when present, must be considered. An asymptotic analysis approach, termed the bandwidth separation method, is proposed. This method enables simplification and order reduction of the original differential equations when sufficient bandwidth separation exists. Through this method, the interaction between the DVC and PLL is explicitly characterized, revealing that such interaction degrades system stability and shrinks the stability region. The analysis also indicates that voltage instability, rather than PLL loss of synchronization alone, is often the root cause of transient instability. Optimal bandwidth configurations for the PLL and DVC are identified under various grid fault conditions: a larger PLL bandwidth improves resilience to phase-jump faults, while a larger DVC bandwidth enhances tolerance to power fluctuations. In addition, the influence of the TVC loop is analyzed, showing that a high TVC bandwidth can mitigate the destabilizing effects of PLL-DVC interaction and further improve transient stability. All analytical findings are validated through hardware-in-the-loop (HIL) experiments.

98.6SYMar 31
Large-Signal Stability of Power Systems with Mixtures of GFL, GFM and GSP Inverters

Yifan Zhang, Yaoxin Wang, Yunjie Gu et al.

Grid-following (GFL) inverters have very different large-signal stability characteristics to synchronous generators, and convenient concepts such as the equal-area criterion and global energy function do not apply in the same way. Existing studies mainly focus on the synchronization stability of an individual GFL inverter, while interactions between multiple inverters are less often addressed. This paper elucidates the interaction mechanisms between heterogeneous inverters, covering GFL, grid-forming (GFM), and grid-supporting (GSP) types, to determine the stability boundaries of systems with mixed inverter compositions. The generalized large-signal model for two-inverter systems is derived for various inverter combinations. This paper establishes that systems containing GFL inverters do not admit a global energy function, fundamentally limiting the applicability of traditional direct methods. To overcome this barrier, a manifold method is employed to accurately determine the region of attraction (ROA). To address the computational complexity of the manifold method, reduced-order models of inverter are used based on multiscale analysis. The large-signal stability margin is assessed by the shortest distance from a stable equilibrium point (SEP) to the boundary of the ROA, which is called the stability radius (SR). Using the proposed framework, the analysis reults of two-inverter system show that both GFM and GSP inverters significantly enhance the large-signal stability of a two-inverter system where the other inverter is GFL, with GFM providing slightly superior performance. This improvement is attributed to the voltage support effects and is maximized when the GFM or GSP inverter is located at the midpoint of the transmission line, where the voltage is lowest. All findings in this paper are validated through both EMT simulations and power hardware-in-the-loop (PHIL) experiments.

81.8SYApr 13
Localization and Reshaping of Non-Minimum-Phase Zeros in Multi-Converter Systems

Ailixier Yaermaimaiti, Jiaxin Wang, Yunjie Gu et al.

Non-minimum-phase (NMP) zeros in multi-converter power systems impose bandwidth ceilings on feedback control, yet quantifying them at the system level has been impractical because commercial converters withhold their internal controller models. This paper develops a Jacobian-based framework that decouples the NMP zeros from individual converter dynamics, proves them to be strictly real, and expresses their values as the singular values of a matrix constructed solely from the grid admittance matrix and steady-state power injections. Because these zeros govern the peak magnitude of the complementary sensitivity function, an exponential lower bound on this peak is derived as a function of the dominant zero, establishing that as the zero approaches the origin the stability margin degrades unavoidably. To counteract this degradation, a zero reshaping strategy is proposed that ranks converter nodes by their real participation factors and identifies the optimal site for voltage droop deployment without iterative search, steering the dominant zero away from the origin and thereby suppressing the sensitivity peak.

CVMar 23, 2025Code
Histomorphology-driven multi-instance learning for breast cancer WSI classification

Baizhi Wang, Rui Yan, Wenxin Ma et al.

Histomorphology is crucial in breast cancer diagnosis. However, existing whole slide image (WSI) classification methods struggle to effectively incorporate histomorphology information, limiting their ability to capture key and fine-grained pathological features. To address this limitation, we propose a novel framework that explicitly incorporates histomorphology (tumor cellularity, cellular morphology, and tissue architecture) into WSI classification. Specifically, our approach consists of three key components: (1) estimating the importance of tumor-related histomorphology information at the patch level based on medical prior knowledge; (2) generating representative cluster-level features through histomorphology-driven cluster pooling; and (3) enabling WSI-level classification through histomorphology-driven multi-instance aggregation. With the incorporation of histomorphological information, our framework strengthens the model's ability to capture key and fine-grained pathological patterns, thereby enhancing WSI classification performance. Experimental results demonstrate its effectiveness, achieving high diagnostic accuracy for molecular subtyping and cancer subtyping. The code will be made available at https://github.com/Badgewho/HMDMIL.

LGOct 27, 2021Code
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks

Jianhong Wang, Wangkun Xu, Yunjie Gu et al.

This paper presents a problem in power networks that creates an exciting and yet challenging real-world scenario for application of multi-agent reinforcement learning (MARL). The emerging trend of decarbonisation is placing excessive stress on power distribution networks. Active voltage control is seen as a promising solution to relieve power congestion and improve voltage quality without extra hardware investment, taking advantage of the controllable apparatuses in the network, such as roof-top photovoltaics (PVs) and static var compensators (SVCs). These controllable apparatuses appear in a vast number and are distributed in a wide geographic area, making MARL a natural candidate. This paper formulates the active voltage control problem in the framework of Dec-POMDP and establishes an open-source environment. It aims to bridge the gap between the power community and the MARL community and be a drive force towards real-world applications of MARL algorithms. Finally, we analyse the special characteristics of the active voltage control problems that cause challenges (e.g. interpretability) for state-of-the-art MARL approaches, and summarise the potential directions.

LGMay 31, 2021Code
SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Jianhong Wang, Yuan Zhang, Yunjie Gu et al.

Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory. We generalise Shapley value to Markov convex game called Markov Shapley value (MSV) and apply it as a value factorisation method in global reward game, which is obtained by the equivalence between the two games. Based on the properties of MSV, we derive Shapley-Bellman optimality equation (SBOE) to evaluate the optimal MSV, which corresponds to an optimal joint deterministic policy. Furthermore, we propose Shapley-Bellman operator (SBO) that is proved to solve SBOE. With a stochastic approximation and some transformations, a new MARL algorithm called Shapley Q-learning (SHAQ) is established, the implementation of which is guided by the theoretical results of SBO and MSV. We also discuss the relationship between SHAQ and relevant value factorisation methods. In the experiments, SHAQ exhibits not only superior performances on all tasks but also the interpretability that agrees with the theoretical analysis. The implementation of this paper is on https://github.com/hsvgbkhgbv/shapley-q-learning.

CLJun 11, 2020
Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System

Jianhong Wang, Yuan Zhang, Tae-Kyun Kim et al.

Designing task-oriented dialogue systems is a challenging research topic, since it needs not only to generate utterances fulfilling user requests but also to guarantee the comprehensibility. Many previous works trained end-to-end (E2E) models with supervised learning (SL), however, the bias in annotated system utterances remains as a bottleneck. Reinforcement learning (RL) deals with the problem through using non-differentiable evaluation metrics (e.g., the success rate) as rewards. Nonetheless, existing works with RL showed that the comprehensibility of generated system utterances could be corrupted when improving the performance on fulfilling user requests. In our work, we (1) propose modelling the hierarchical structure between dialogue policy and natural language generator (NLG) with the option framework, called HDNO, where the latent dialogue act is applied to avoid designing specific dialogue act representations; (2) train HDNO via hierarchical reinforcement learning (HRL), as well as suggest the asynchronous updates between dialogue policy and NLG during training to theoretically guarantee their convergence to a local maximizer; and (3) propose using a discriminator modelled with language models as an additional reward to further improve the comprehensibility. We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA, showing improvements on the performance evaluated by automatic evaluation metrics and human evaluation. Finally, we demonstrate the semantic meanings of latent dialogue acts to show the explanability for HDNO.

LGJul 11, 2019
Shapley Q-value: A Local Reward Approach to Solve Global Reward Games

Jianhong Wang, Yuan Zhang, Tae-Kyun Kim et al.

Cooperative game is a critical research area in the multi-agent reinforcement learning (MARL). Global reward game is a subclass of cooperative games, where all agents aim to maximize the global reward. Credit assignment is an important problem studied in the global reward game. Most of previous works stood by the view of non-cooperative-game theoretical framework with the shared reward approach, i.e., each agent being assigned a shared global reward directly. This, however, may give each agent an inaccurate reward on its contribution to the group, which could cause inefficient learning. To deal with this problem, we i) introduce a cooperative-game theoretical framework called extended convex game (ECG) that is a superset of global reward game, and ii) propose a local reward approach called Shapley Q-value. Shapley Q-value is able to distribute the global reward, reflecting each agent's own contribution in contrast to the shared reward approach. Moreover, we derive an MARL algorithm called Shapley Q-value deep deterministic policy gradient (SQDDPG), using Shapley Q-value as the critic for each agent. We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator and Traffic Junction, compared with the state-of-the-art algorithms, e.g., MADDPG, COMA, Independent DDPG and Independent A2C. In the experiments, SQDDPG shows a significant improvement on the convergence rate. Finally, we plot Shapley Q-value and validate the property of fair credit assignment.