SYFeb 23, 2016
Data-Driven Real-Time Power Dispatch for Maximizing Variable Renewable GenerationZhigang Li, Feng Qiu, Jianhui Wang
Traditional power dispatch methods have difficulties in accommodating large-scale variable renewable generation (VRG) and have resulted in unnecessary VRG spillage in the practical industry. The recent dispatchable-interval-based methods have the potential to reduce VRG curtailment, but the dispatchable intervals are not allocated effectively due to the lack of exploiting historical dispatch records of VRG units. To bridge this gap, this paper proposes a novel data-driven real-time dispatch approach to maximize VRG utili-zation by using do-not-exceed (DNE) limits. This approach defines the maximum generation output ranges that the system can ac-commodate without compromising reliability. The DNE limits of VRG units and operating base points of conventional units are co-optimized by hybrid stochastic and robust optimization, and the decision models are formulated as mixed-integer linear programs by the sample average approximation technique exploiting historical VRG data. A strategy for selecting historical data samples is also proposed to capture the VRG uncertainty more accurately under variant prediction output levels. Computational experiments show the effectiveness of the proposed methods.
SYFeb 13, 2017
Model-Free MLE Estimation for Online Rotor Angle Stability Assessment with PMU DataShaopan Wei, Ming Yang, Junjian Qi et al.
Recent research has demonstrated that the rotor angle stability can be assessed by identifying the sign of the system maximal Lyapunov exponent (MLE). A positive (negative) MLE implies unstable (stable) rotor angle dynamics. However, because the MLE may fluctuate between positive and negative values for a long time after a severe disturbance, it is difficult to determine the system stability when observing a positive or negative MLE without knowing its further fluctuation trend. In this paper, a new approach for online rotor angle stability assessment is proposed to address this problem. The MLE is estimated by a recursive least square (RLS) based method based on real-time rotor angle measurements, and two critical parameters, the Theiler window and the MLE estimation initial time step, are carefully chosen to make sure the calculated MLE curves present distinct features for different stability conditions. By using the proposed stability assessment criteria, the developed approach can provide timely and reliable assessment of the rotor angle stability. Extensive tests on the New-England 39-bus system and the Northeast Power Coordinating Council 140-bus system verify the effectiveness of the proposed approach.
SYMar 9, 2017
A Framework for Dynamic Stability Analysis of Power Systems with Volatile Wind PowerXiaozhe Wang, Tao Wang, Hsiao-Dong Chiang et al.
We propose a framework employing stochastic differential equations to facilitate the long-term stability analysis of power grids with intermittent wind power generations. This framework takes into account the discrete dynamics which play a critical role in the long-term stability analysis, incorporates the model of wind speed with different probability distributions, and also develops an approximation methodology (by a deterministic hybrid model) for the stochastic hybrid model to reduce the computational burden brought about by the uncertainty of wind power. The theoretical and numerical studies show that a deterministic hybrid model can provide an accurate trajectory approximation and stability assessments for the stochastic hybrid model under mild conditions. In addition, we discuss the critical cases that the deterministic hybrid model fails and discover that these cases are caused by a violation of the proposed sufficient conditions. Such discussion complements the proposed framework and methodology and also reaffirms the importance of the stochastic hybrid model when the system operates close to its stability limit.
SYApr 5, 2018
Blockchain-Assisted Crowdsourced Energy SystemsShen Wang, Ahmad Taha, Jianhui Wang
Crowdsourcing relies on people's contributions to meet product- or system-level objectives. Crowdsourcing-based methods have been implemented in various cyber-physical systems and realtime markets. This paper explores a framework for Crowdsourced Energy Systems (CES), where small-scale energy generation or energy trading is crowdsourced from distributed energy resources, electric vehicles, and shapable loads. The merits/pillars of energy crowdsourcing are discussed. Then, an operational model for CESs in distribution networks with different types of crowdsourcees is proposed. The model yields a market equilibrium depicting traditional and distributed generator and load setpoints. Given these setpoints, crowdsourcing incentives are designed to steer crowdsourcees to the equilibrium. As the number of crowdsourcees and energy trading transactions scales up, a secure energy trading platform is required. To that end, the presented framework is integrated with a lightweight Blockchain implementation and smart contracts. Numerical tests are provided to showcase the overall implementation.
SYMay 8, 2018
Robust Estimation of Reactive Power for an Active Distribution SystemZhengshuo Li, Jianhui Wang, Hongbin Sun et al.
Increasing distributed energy resources (DERs) may result in reactive power imbalance in a transmission power system (TPS). An active distribution power system (DPS) having DERs reportedly can work as a reactive power prosumer to help balance the reactive power in the TPS. The reactive power potential (RPP) of a DPS, which is the range between the maximal inductive and capacitive reactive power the DPS can reliably provide, should be accurately estimated. However, an accurate estimation is difficult because of the network constraints, mixed discrete and continuous variables, and the nonnegligible uncertainty in the DPS. To solve this problem, this paper proposes a robust RPP estimation method based on two-stage robust optimization, where the uncertainty in DERs and the boundary-bus voltage is considered. In this two-stage robust model, the RPP is pre-estimated in the first stage and its robust feasibility for any possible instance of the uncertainty is checked via a tractable problem in the second stage. The column-and-constraint generation algorithm is adopted, which solves this model in finite iterations. Case studies show that this robust method excels in yielding a completely reliable RPP, and also that a DPS, even under the uncertainty, is still an effective reactive power prosumer for the TPS.
SYDec 22, 2017
A Response-Function-Based Coordination Method for Transmission-Distribution-Coupled AC OPFZhengshuo Li, Qinglai Guo, Hongbin Sun et al.
With distributed generation highly integrated into the grid, the transmission-distribution-coupled AC OPF (TDOPF) becomes increasingly important. This paper proposes a response-function-based coordination method to solve the TDOPF. Different from typical decomposition methods, this method employs approximate response functions of the power injections with respect to the bus voltage magnitude in the transmission-distribution (T-D) interface to reflect the "reaction" of the distribution to the transmission system control. By using the response functions, only one or two iterations between the transmission system operator (TSO) and the distribution system operator(s) (DSO(s)) are required to attain a nearly optimal TDOPF solution. Numerical tests confirm that, relative to a typical decomposition method, the proposed method does not only enjoy a cheaper computational cost but is workable even when the objectives of the TSO and the DSO(s) are in distinct scales.
SYJun 29, 2018
Comparing Kalman Filters and Observers for Power System Dynamic State Estimation with Model Uncertainty and Malicious Cyber AttacksJunjian Qi, Ahmad F. Taha, Jianhui Wang
Kalman filters and observers are two main classes of dynamic state estimation (DSE) routines. Power system DSE has been implemented by various Kalman filters, such as the extended Kalman filter (EKF) and the unscented Kalman filter (UKF). In this paper, we discuss two challenges for an effective power system DSE: (a) model uncertainty and (b) potential cyber attacks. To address this, the cubature Kalman filter (CKF) and a nonlinear observer are introduced and implemented. Various Kalman filters and the observer are then tested on the 16-machine, 68-bus system given realistic scenarios under model uncertainty and different types of cyber attacks against synchrophasor measurements. It is shown that CKF and the observer are more robust to model uncertainty and cyber attacks than their counterparts. Based on the tests, a thorough qualitative comparison is also performed for Kalman filter routines and observers.
SYOct 25, 2016
Towards High-Efficiency Cascading Outage Simulation and Analysis in Power Systems: A Sequential Importance Sampling ApproachJinpeng Guo, Feng Liu, Jianhui Wang et al.
This paper addresses how to improve the computational efficiency and estimation reliability in cascading outage analysis. We first formulate a cascading outage as a Markov chain with specific state space and transition probability by leveraging the Markov property of cascading outages. It provides a rigorous formulation that allows analytic investigation on cascading outages in the framework of standard mathematical statistics. Then we derive a sequential importance sampling (SIS) based simulation strategy for cascading outage simulation and blackout risk analysis with theoretical justification. Numerical experiments manifest that the proposed SIS strategy can significantly bring down the number of simulations and reduce the estimation variance of cascading outage analysis compared with the traditional Monte Carlo simulation strategy.
CLSep 1, 2024
Self-evolving Agents with reflective and memory-augmented abilitiesXuechen Liang, Yangfan He, Yinghui Xia et al.
Large language models (LLMs) have made significant advances in the field of natural language processing, but they still face challenges such as continuous decision-making. In this research, we propose a novel framework by integrating iterative feedback, reflective mechanisms, and a memory optimization mechanism based on the Ebbinghaus forgetting curve, it significantly enhances the agents' capabilities in handling multi-tasking and long-span information.
CVApr 2, 2024Code
WcDT: World-centric Diffusion Transformer for Traffic Scene GenerationChen Yang, Yangfan He, Aaron Xuxiang Tian et al.
In this paper, we introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models (a.k.a., diffusion models) and transformers. Our proposed framework, termed the "World-Centric Diffusion Transformer"(WcDT), optimizes the entire trajectory generation process, from feature extraction to model inference. To enhance the scene diversity and stochasticity, the historical trajectory data is first preprocessed into "Agent Move Statement" and encoded into latent space using Denoising Diffusion Probabilistic Models (DDPM) enhanced with Diffusion with Transformer (DiT) blocks. Then, the latent features, historical trajectories, HD map features, and historical traffic signal information are fused with various transformer-based encoders that are used to enhance the interaction of agents with other elements in the traffic scene. The encoded traffic scenes are then decoded by a trajectory decoder to generate multimodal future trajectories. Comprehensive experimental results show that the proposed approach exhibits superior performance in generating both realistic and diverse trajectories, showing its potential for integration into automatic driving simulation systems. Our code is available at \url{https://github.com/yangchen1997/WcDT}.
LGOct 28, 2024Code
FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization systemZeyuan Li, Yangfan He, Lewei He et al.
Recently, large language models (LLMs) have achieved significant progress in automated code generation. Despite their strong instruction-following capabilities, these models frequently struggled to align with user intent in coding scenarios. In particular, they were hampered by datasets that lacked diversity and failed to address specialized tasks or edge cases. Furthermore, challenges in supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) led to failures in generating precise, human-intent-aligned code. To tackle these challenges and improve the code generation performance for automated programming systems, we propose Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization (i.e., FALCON). FALCON is structured into two hierarchical levels. From the global level, long-term memory improves code quality by retaining and applying learned knowledge. At the local level, short-term memory allows for the incorporation of immediate feedback from compilers and AI systems. Additionally, we introduce meta-reinforcement learning with feedback rewards to solve the global-local bi-level optimization problem and enhance the model's adaptability across diverse code generation tasks. Extensive experiments demonstrate that our technique achieves state-of-the-art performance, leading other reinforcement learning methods by more than 4.5 percentage points on the MBPP benchmark and 6.1 percentage points on the Humaneval benchmark. The open-sourced code is publicly available at https://github.com/titurte/FALCON.
SOC-PHDec 2, 2017
An Adjustable Chance-Constrained Approach for Flexible Ramping Capacity AllocationZhiwen Wang, Chen Shen, Feng Liu et al.
With the fast growth of wind power penetration, power systems need additional flexibility to cope with wind power ramping. Several electricity markets have established requirements for flexible ramping capacity (FRC) reserves. This paper addresses two crucial issues that have rarely been discussed in the literature: 1) how to characterize wind power ramping under different forecast values and 2) how to achieve a reasonable trade-off between operational risks and FRC costs. Regarding the first issue, this paper proposes a concept of conditional distributions of wind power ramping, which is empirically verified by using simulation and real-world data. For the second issue, this paper develops an adjustable chance-constrained approach to optimally allocate FRC reserves. Equivalent tractable forms of the original problem are devised to improve computational efficiency. Tests carried out on a modified IEEE 118-bus system demonstrate the effectiveness and efficiency of the proposed method.
CLJun 23, 2024Code
Enhancing Commentary Strategies for Imperfect Information Card Games: A Study of Large Language Models in Guandan CommentaryMeiling Tao, Xuechen Liang, Xinyuan Song et al.
Recent advancements in large language models (LLMs) have unlocked the potential for generating high-quality game commentary. However, producing insightful and engaging commentary for complex games with incomplete information remains a significant challenge. In this paper, we introduce a novel commentary method that combine Reinforcement Learning (RL) and LLMs, tailored specifically for the Chinese card game \textit{Guandan}. Our system leverages RL to generate intricate card-playing scenarios and employs LLMs to generate corresponding commentary text, effectively emulating the strategic analysis and narrative prowess of professional commentators. The framework comprises a state commentary guide, a Theory of Mind (ToM)-based strategy analyzer, and a style retrieval module, which seamlessly collaborate to deliver detailed and context-relevant game commentary in the Chinese language environment. We empower LLMs with ToM capabilities and refine both retrieval and information filtering mechanisms. This facilitates the generation of personalized commentary content. Our experimental results showcase the substantial enhancement in performance achieved by the proposed commentary framework when applied to open-source LLMs, surpassing the performance of GPT-4 across multiple evaluation metrics.
CLMar 30, 2025
SCORE: Story Coherence and Retrieval Enhancement for AI NarrativesQiang Yi, Yangfan He, Jianhui Wang et al.
Large Language Models (LLMs) can generate creative and engaging narratives from user-specified input, but maintaining coherence and emotional depth throughout these AI-generated stories remains a challenge. In this work, we propose SCORE, a framework for Story Coherence and Retrieval Enhancement, designed to detect and resolve narrative inconsistencies. By tracking key item statuses and generating episode summaries, SCORE uses a Retrieval-Augmented Generation (RAG) approach to identify related episodes and enhance the overall story structure. Experimental results from testing multiple LLM-generated stories demonstrate that SCORE significantly improves the consistency and stability of narrative coherence compared to baseline GPT models, providing a more robust method for evaluating and refining AI-generated narratives.
CVJan 8, 2025
Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware InversionYangfan He, Sida Li, Jianhui Wang et al.
Recent advancements in text-to-image (T2I) generation using diffusion models have enabled cost-effective video-editing applications by leveraging pre-trained models, eliminating the need for resource-intensive training. However, the frame-independence of T2I generation often results in poor temporal consistency. Existing methods address this issue through temporal layer fine-tuning or inference-based temporal propagation, but these approaches suffer from high training costs or limited temporal coherence. To address these challenges, we propose a General and Efficient Adapter (GE-Adapter) that integrates temporal-spatial and semantic consistency with Baliteral DDIM inversion. This framework introduces three key components: (1) Frame-based Temporal Consistency Blocks (FTC Blocks) to capture frame-specific features and enforce smooth inter-frame transitions via temporally-aware loss functions; (2) Channel-dependent Spatial Consistency Blocks (SCD Blocks) employing bilateral filters to enhance spatial coherence by reducing noise and artifacts; and (3) Token-based Semantic Consistency Module (TSC Module) to maintain semantic alignment using shared prompt tokens and frame-specific tokens. Our method significantly improves perceptual quality, text-image alignment, and temporal coherence, as demonstrated on the MSR-VTT dataset. Additionally, it achieves enhanced fidelity and frame-to-frame coherence, offering a practical solution for T2V editing.
CLApr 2, 2024
CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language ModelsXuechen Liang, Yangfan He, Meiling Tao et al.
Open large language models (LLMs) have significantly advanced the field of natural language processing, showcasing impressive performance across various tasks.Despite the significant advancements in LLMs, their effective operation still relies heavily on human input to accurately guide the dialogue flow, with agent tuning being a crucial optimization technique that involves human adjustments to the model for better response to such guidance.Addressing this dependency, our work introduces the TinyAgent model, trained on a meticulously curated high-quality dataset. We also present the Collaborative Multi-Agent Tuning (CMAT) framework, an innovative system designed to augment language agent capabilities through adaptive weight updates based on environmental feedback. This framework fosters collaborative learning and real-time adaptation among multiple intelligent agents, enhancing their context-awareness and long-term memory. In this research, we propose a new communication agent framework that integrates multi-agent systems with environmental feedback mechanisms, offering a scalable method to explore cooperative behaviors. Notably, our TinyAgent-7B model exhibits performance on par with GPT-3.5, despite having fewer parameters, signifying a substantial improvement in the efficiency and effectiveness of LLMs.
CVJan 25, 2025
Enhancing Intent Understanding for Ambiguous prompt: A Human-Machine Co-Adaption StrategyYangfan He, Jianhui Wang, Yijin Wang et al.
Current image generation systems produce high-quality images but struggle with ambiguous user prompts, making interpretation of actual user intentions difficult. Many users must modify their prompts several times to ensure the generated images meet their expectations. While some methods focus on enhancing prompts to make the generated images fit user needs, the model is still hard to understand users' real needs, especially for non-expert users. In this research, we aim to enhance the visual parameter-tuning process, making the model user-friendly for individuals without specialized knowledge and better understand user needs. We propose a human-machine co-adaption strategy using mutual information between the user's prompts and the pictures under modification as the optimizing target to make the system better adapt to user needs. We find that an improved model can reduce the necessity for multiple rounds of adjustments. We also collect multi-round dialogue datasets with prompts and images pairs and user intent. Various experiments demonstrate the effectiveness of the proposed method in our proposed dataset. Our dataset and annotation tools will be available.
RONov 27, 2024
FASIONAD : FAst and Slow FusION Thinking Systems for Human-Like Autonomous Driving with Adaptive FeedbackKangan Qian, Zhikun Ma, Yangfan He et al. · tsinghua
Ensuring safe, comfortable, and efficient navigation is a critical goal for autonomous driving systems. While end-to-end models trained on large-scale datasets excel in common driving scenarios, they often struggle with rare, long-tail events. Recent progress in large language models (LLMs) has introduced enhanced reasoning capabilities, but their computational demands pose challenges for real-time decision-making and precise planning. This paper presents FASIONAD, a novel dual-system framework inspired by the cognitive model "Thinking, Fast and Slow." The fast system handles routine navigation tasks using rapid, data-driven path planning, while the slow system focuses on complex reasoning and decision-making in challenging or unfamiliar situations. A dynamic switching mechanism based on score distribution and feedback allows seamless transitions between the two systems. Visual prompts generated by the fast system enable human-like reasoning in the slow system, which provides high-quality feedback to enhance the fast system's decision-making. To evaluate FASIONAD, we introduce a new benchmark derived from the nuScenes dataset, specifically designed to differentiate fast and slow scenarios. FASIONAD achieves state-of-the-art performance on this benchmark, establishing a new standard for frameworks integrating fast and slow cognitive processes in autonomous driving. This approach paves the way for more adaptive, human-like autonomous driving systems.
CVApr 21, 2025
Twin Co-Adaptive Dialogue for Progressive Image GenerationJianhui Wang, Yangfan He, Yan Zhong et al.
Modern text-to-image generation systems have enabled the creation of remarkably realistic and high-quality visuals, yet they often falter when handling the inherent ambiguities in user prompts. In this work, we present Twin-Co, a framework that leverages synchronized, co-adaptive dialogue to progressively refine image generation. Instead of a static generation process, Twin-Co employs a dynamic, iterative workflow where an intelligent dialogue agent continuously interacts with the user. Initially, a base image is generated from the user's prompt. Then, through a series of synchronized dialogue exchanges, the system adapts and optimizes the image according to evolving user feedback. The co-adaptive process allows the system to progressively narrow down ambiguities and better align with user intent. Experiments demonstrate that Twin-Co not only enhances user experience by reducing trial-and-error iterations but also improves the quality of the generated images, streamlining the creative process across various applications.
CVMar 11, 2025
MaRI: Material Retrieval Integration across DomainsJianhui Wang, Zhifei Yang, Yangfan He et al. · pku
Accurate material retrieval is critical for creating realistic 3D assets. Existing methods rely on datasets that capture shape-invariant and lighting-varied representations of materials, which are scarce and face challenges due to limited diversity and inadequate real-world generalization. Most current approaches adopt traditional image search techniques. They fall short in capturing the unique properties of material spaces, leading to suboptimal performance in retrieval tasks. Addressing these challenges, we introduce MaRI, a framework designed to bridge the feature space gap between synthetic and real-world materials. MaRI constructs a shared embedding space that harmonizes visual and material attributes through a contrastive learning strategy by jointly training an image and a material encoder, bringing similar materials and images closer while separating dissimilar pairs within the feature space. To support this, we construct a comprehensive dataset comprising high-quality synthetic materials rendered with controlled shape variations and diverse lighting conditions, along with real-world materials processed and standardized using material transfer techniques. Extensive experiments demonstrate the superior performance, accuracy, and generalization capabilities of MaRI across diverse and complex material retrieval tasks, outperforming existing methods.
TRJul 13, 2025
MountainLion: A Multi-Modal LLM-Based Agent System for Interpretable and Adaptive Financial TradingSiyi Wu, Junqiao Wang, Zhaoyang Guan et al.
Cryptocurrency trading is a challenging task requiring the integration of heterogeneous data from multiple modalities. Traditional deep learning and reinforcement learning approaches typically demand large training datasets and encode diverse inputs into numerical representations, often at the cost of interpretability. Recent progress in large language model (LLM)-based agents has demonstrated the capacity to process multi-modal data and support complex investment decision-making. Building on these advances, we present \textbf{MountainLion}, a multi-modal, multi-agent system for financial trading that coordinates specialized LLM-based agents to interpret financial data and generate investment strategies. MountainLion processes textual news, candlestick charts, and trading signal charts to produce high-quality financial reports, while also enabling modification of reports and investment recommendations through data-driven user interaction and question answering. A central reflection module analyzes historical trading signals and outcomes to continuously refine decision processes, and the system is capable of real-time report analysis, summarization, and dynamic adjustment of investment strategies. Empirical results confirm that MountainLion systematically enriches technical price triggers with contextual macroeconomic and capital flow signals, providing a more interpretable, robust, and actionable investment framework that improves returns and strengthens investor confidence.
CLMar 25, 2025
MARS: Memory-Enhanced Agents with Reflective Self-improvementXuechen Liang, Meiling Tao, Yinghui Xia et al.
Large language models (LLMs) have made significant advances in the field of natural language processing, but they still face challenges such as continuous decision-making, lack of long-term memory, and limited context windows in dynamic environments. To address these issues, this paper proposes an innovative framework Memory-Enhanced Agents with Reflective Self-improvement. The MARS framework comprises three agents: the User, the Assistant, and the Checker. By integrating iterative feedback, reflective mechanisms, and a memory optimization mechanism based on the Ebbinghaus forgetting curve, it significantly enhances the agents capabilities in handling multi-tasking and long-span information.
LGOct 16, 2025
Coder as Editor: Code-driven Interpretable Molecular OptimizationWenyu Zhu, Chengzhu Li, Xiaohe Tian et al.
Molecular optimization is a central task in drug discovery that requires precise structural reasoning and domain knowledge. While large language models (LLMs) have shown promise in generating high-level editing intentions in natural language, they often struggle to faithfully execute these modifications-particularly when operating on non-intuitive representations like SMILES. We introduce MECo, a framework that bridges reasoning and execution by translating editing actions into executable code. MECo reformulates molecular optimization for LLMs as a cascaded framework: generating human-interpretable editing intentions from a molecule and property goal, followed by translating those intentions into executable structural edits via code generation. Our approach achieves over 98% accuracy in reproducing held-out realistic edits derived from chemical reactions and target-specific compound pairs. On downstream optimization benchmarks spanning physicochemical properties and target activities, MECo substantially improves consistency by 38-86 percentage points to 90%+ and achieves higher success rates over SMILES-based baselines while preserving structural similarity. By aligning intention with execution, MECo enables consistent, controllable and interpretable molecular design, laying the foundation for high-fidelity feedback loops and collaborative human-AI workflows in drug discovery.
LGAug 21, 2025
Learning Protein-Ligand Binding in Hyperbolic SpaceJianhui Wang, Wenyu Zhu, Bowen Gao et al.
Protein-ligand binding prediction is central to virtual screening and affinity ranking, two fundamental tasks in drug discovery. While recent retrieval-based methods embed ligands and protein pockets into Euclidean space for similarity-based search, the geometry of Euclidean embeddings often fails to capture the hierarchical structure and fine-grained affinity variations intrinsic to molecular interactions. In this work, we propose HypSeek, a hyperbolic representation learning framework that embeds ligands, protein pockets, and sequences into Lorentz-model hyperbolic space. By leveraging the exponential geometry and negative curvature of hyperbolic space, HypSeek enables expressive, affinity-sensitive embeddings that can effectively model both global activity and subtle functional differences-particularly in challenging cases such as activity cliffs, where structurally similar ligands exhibit large affinity gaps. Our mode unifies virtual screening and affinity ranking in a single framework, introducing a protein-guided three-tower architecture to enhance representational structure. HypSeek improves early enrichment in virtual screening on DUD-E from 42.63 to 51.44 (+20.7%) and affinity ranking correlation on JACS from 0.5774 to 0.7239 (+25.4%), demonstrating the benefits of hyperbolic geometry across both tasks and highlighting its potential as a powerful inductive bias for protein-ligand modeling.
CVJul 29, 2025
Low-Cost Test-Time Adaptation for Robust Video EditingJianhui Wang, Yinda Chen, Yangfan He et al.
Video editing is a critical component of content creation that transforms raw footage into coherent works aligned with specific visual and narrative objectives. Existing approaches face two major challenges: temporal inconsistencies due to failure in capturing complex motion patterns, and overfitting to simple prompts arising from limitations in UNet backbone architectures. While learning-based methods can enhance editing quality, they typically demand substantial computational resources and are constrained by the scarcity of high-quality annotated data. In this paper, we present Vid-TTA, a lightweight test-time adaptation framework that personalizes optimization for each test video during inference through self-supervised auxiliary tasks. Our approach incorporates a motion-aware frame reconstruction mechanism that identifies and preserves crucial movement regions, alongside a prompt perturbation and reconstruction strategy that strengthens model robustness to diverse textual descriptions. These innovations are orchestrated by a meta-learning driven dynamic loss balancing mechanism that adaptively adjusts the optimization process based on video characteristics. Extensive experiments demonstrate that Vid-TTA significantly improves video temporal consistency and mitigates prompt overfitting while maintaining low computational overhead, offering a plug-and-play performance boost for existing video editing models.
CVApr 25, 2025
Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference UnderstandingKun Li, Jianhui Wang, Yangfan He et al.
Generative AI has significantly changed industries by enabling text-driven image generation, yet challenges remain in achieving high-resolution outputs that align with fine-grained user preferences. Consequently, multi-round interactions are necessary to ensure the generated images meet expectations. Previous methods enhanced prompts via reward feedback but did not optimize over a multi-round dialogue dataset. In this work, we present a Visual Co-Adaptation (VCA) framework incorporating human-in-the-loop feedback, leveraging a well-trained reward model aligned with human preferences. Using a diverse multi-turn dialogue dataset, our framework applies multiple reward functions, such as diversity, consistency, and preference feedback, while fine-tuning the diffusion model through LoRA, thus optimizing image generation based on user input. We also construct multi-round dialogue datasets of prompts and image pairs aligned with user intent. Experiments demonstrate that our method outperforms state-of-the-art baselines, significantly improving image consistency and alignment with user intent. Our approach consistently surpasses competing models in user satisfaction, especially in multi-turn dialogue scenarios.
CVApr 22, 2025
Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical FrameworkXinyuan Song, Yangfan He, Sida Li et al.
Adapter-based methods are commonly used to enhance model performance with minimal additional complexity, especially in video editing tasks that require frame-to-frame consistency. By inserting small, learnable modules into pretrained diffusion models, these adapters can maintain temporal coherence without extensive retraining. Approaches that incorporate prompt learning with both shared and frame-specific tokens are particularly effective in preserving continuity across frames at low training cost. In this work, we want to provide a general theoretical framework for adapters that maintain frame consistency in DDIM-based models under a temporal consistency loss. First, we prove that the temporal consistency objective is differentiable under bounded feature norms, and we establish a Lipschitz bound on its gradient. Second, we show that gradient descent on this objective decreases the loss monotonically and converges to a local minimum if the learning rate is within an appropriate range. Finally, we analyze the stability of modules in the DDIM inversion procedure, showing that the associated error remains controlled. These theoretical findings will reinforce the reliability of diffusion-based video editing methods that rely on adapter strategies and provide theoretical insights in video generation tasks.
CVMar 22, 2025
TDRI: Two-Phase Dialogue Refinement and Co-Adaptation for Interactive Image GenerationYuheng Feng, Jianhui Wang, Kun Li et al.
Although text-to-image generation technologies have made significant advancements, they still face challenges when dealing with ambiguous prompts and aligning outputs with user intent.Our proposed framework, TDRI (Two-Phase Dialogue Refinement and Co-Adaptation), addresses these issues by enhancing image generation through iterative user interaction. It consists of two phases: the Initial Generation Phase, which creates base images based on user prompts, and the Interactive Refinement Phase, which integrates user feedback through three key modules. The Dialogue-to-Prompt (D2P) module ensures that user feedback is effectively transformed into actionable prompts, which improves the alignment between user intent and model input. By evaluating generated outputs against user expectations, the Feedback-Reflection (FR) module identifies discrepancies and facilitates improvements. In an effort to ensure consistently high-quality results, the Adaptive Optimization (AO) module fine-tunes the generation process by balancing user preferences and maintaining prompt fidelity. Experimental results show that TDRI outperforms existing methods by achieving 33.6% human preference, compared to 6.2% for GPT-4 augmentation, and the highest CLIP and BLIP alignment scores (0.338 and 0.336, respectively). In iterative feedback tasks, user satisfaction increased to 88% after 8 rounds, with diminishing returns beyond 6 rounds. Furthermore, TDRI has been found to reduce the number of iterations and improve personalization in the creation of fashion products. TDRI exhibits a strong potential for a wide range of applications in the creative and industrial domains, as it streamlines the creative process and improves alignment with user preferences
CVMar 22, 2025
OMR-Diffusion:Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Intent UnderstandingKun Li, Jianhui Wang, Miao Zhang et al.
Generative AI has significantly advanced text-driven image generation, but it still faces challenges in producing outputs that consistently align with evolving user preferences and intents, particularly in multi-turn dialogue scenarios. In this research, We present a Visual Co-Adaptation (VCA) framework that incorporates human-in-the-loop feedback, utilizing a well-trained reward model specifically designed to closely align with human preferences. Using a diverse multi-turn dialogue dataset, the framework applies multiple reward functions (such as diversity, consistency, and preference feedback) to refine the diffusion model through LoRA, effectively optimizing image generation based on user input. We also constructed multi-round dialogue datasets with prompts and image pairs that well-fit user intent. Experiments show the model achieves 508 wins in human evaluation, outperforming DALL-E 3 (463 wins) and others. It also achieves 3.4 rounds in dialogue efficiency (vs. 13.7 for DALL-E 3) and excels in metrics like LPIPS (0.15) and BLIP (0.59). Various experiments demonstrate the effectiveness of the proposed method over state-of-the-art baselines, with significant improvements in image consistency and alignment with user intent.
CVNov 4, 2024
Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image EditingBo Gao, Jianhui Wang, Xinyuan Song et al.
Current semantic segmentation models typically require a substantial amount of manually annotated data, a process that is both time-consuming and resource-intensive. Alternatively, leveraging advanced text-to-image models such as Midjourney and Stable Diffusion has emerged as an efficient strategy, enabling the automatic generation of synthetic data in place of manual annotations. However, previous methods have been limited to generating single-instance images, as the generation of multiple instances with Stable Diffusion has proven unstable. To address this limitation and expand the scope and diversity of synthetic datasets, we propose a framework \textbf{Free-Mask} that combines a Diffusion Model for segmentation with advanced image editing capabilities, allowing for the integration of multiple objects into images via text-to-image models. Our method facilitates the creation of highly realistic datasets that closely emulate open-world environments while generating accurate segmentation masks. It reduces the labor associated with manual annotation and also ensures precise mask generation. Experimental results demonstrate that synthetic data generated by \textbf{Free-Mask} enables segmentation models to outperform those trained on real data, especially in zero-shot settings. Notably, \textbf{Free-Mask} achieves new state-of-the-art results on previously unseen classes in the VOC 2012 benchmark.
SYNov 8, 2019
Two-stage WECC Composite Load Modeling: A Double Deep Q-Learning Networks ApproachXinan Wang, Yishen Wang, Di Shi et al.
With the increasing complexity of modern power systems, conventional dynamic load modeling with ZIP and induction motors (ZIP + IM) is no longer adequate to address the current load characteristic transitions. In recent years, the WECC composite load model (WECC CLM) has shown to effectively capture the dynamic load responses over traditional load models in various stability studies and contingency analyses. However, a detailed WECC CLM model typically has a high degree of complexity, with over one hundred parameters, and no systematic approach to identifying and calibrating these parameters. Enabled by the wide deployment of PMUs and advanced deep learning algorithms, proposed here is a double deep Q-learning network (DDQN)-based, two-stage load modeling framework for the WECC CLM. This two-stage method decomposes the complicated WECC CLM for more efficient identification and does not require explicit model details. In the first stage, the DDQN agent determines an accurate load composition. In the second stage, the parameters of the WECC CLM are selected from a group of Monte-Carlo simulations. The set of selected load parameters is expected to best approximate the true transient responses. The proposed framework is verified using an IEEE 39-bus test system on commercial simulation platforms.
SIJan 17, 2019
Deep Generative Graph Distribution Learning for Synthetic Power GridsMahdi Khodayar, Jianhui Wang, Zhaoyu Wang
Power system studies require the topological structures of real-world power networks; however, such data is confidential due to important security concerns. Thus, power grid synthesis (PGS), i.e., creating realistic power grids that imitate actual power networks, has gained significant attention. In this letter, we cast PGS into a graph distribution learning (GDL) problem where the probability distribution functions (PDFs) of the nodes (buses) and edges (lines) are captured. A novel deep GDL (DeepGDL) model is proposed to learn the topological patterns of buses/lines with their physical features (e.g., power injection and line impedance). Having a deep nonlinear recurrent structure, DeepGDL understands complex nonlinear topological properties and captures the graph PDF. Sampling from the obtained PDF, we are able to create a large set of realistic networks that all resemble the original power grid. Simulation results show the significant accuracy of our created synthetic power grids in terms of various topological metrics and power flow measurements.
LGSep 10, 2018
Convolutional Graph Auto-encoder: A Deep Generative Neural Architecture for Probabilistic Spatio-temporal Solar Irradiance ForecastingMahdi Khodayar, Saeed Mohammadi, Mohammad Khodayar et al.
Machine Learning on graph-structured data is an important and omnipresent task for a vast variety of applications including anomaly detection and dynamic network analysis. In this paper, a deep generative model is introduced to capture continuous probability densities corresponding to the nodes of an arbitrary graph. In contrast to all learning formulations in the area of discriminative pattern recognition, we propose a scalable generative optimization/algorithm theoretically proved to capture distributions at the nodes of a graph. Our model is able to generate samples from the probability densities learned at each node. This probabilistic data generation model, i.e. convolutional graph auto-encoder (CGAE), is devised based on the localized first-order approximation of spectral graph convolutions, deep learning, and the variational Bayesian inference. We apply our CGAE to a new problem, the spatio-temporal probabilistic solar irradiance prediction. Multiple solar radiation measurement sites in a wide area in northern states of the US are modeled as an undirected graph. Using our proposed model, the distribution of future irradiance given historical radiation observations is estimated for every site/node. Numerical results on the National Solar Radiation Database show state-of-the-art performance for probabilistic radiation prediction on geographically distributed irradiance data in terms of reliability, sharpness, and continuous ranked probability score.
LGSep 10, 2018
Energy Disaggregation via Deep Temporal Dictionary LearningMahdi Khodayar, Jianhui Wang, Zhaoyu Wang
This paper addresses the energy disaggregation problem, i.e. decomposing the electricity signal of a whole home to its operating devices. First, we cast the problem as a dictionary learning (DL) problem where the key electricity patterns representing consumption behaviors are extracted for each device and stored in a dictionary matrix. The electricity signal of each device is then modeled by a linear combination of such patterns with sparse coefficients that determine the contribution of each device in the total electricity. Although popular, the classic DL approach is prone to high error in real-world applications including energy disaggregation, as it merely finds linear dictionaries. Moreover, this method lacks a recurrent structure; thus, it is unable to leverage the temporal structure of energy signals. Motivated by such shortcomings, we propose a novel optimization program where the dictionary and its sparse coefficients are optimized simultaneously with a deep neural model extracting powerful nonlinear features from the energy signals. A long short-term memory auto-encoder (LSTM-AE) is proposed with tunable time dependent states to capture the temporal behavior of energy signals for each device. We learn the dictionary in the space of temporal features captured by the LSTM-AE rather than the original space of the energy signals; hence, in contrast to the traditional DL, here, a nonlinear dictionary is learned using powerful temporal features extracted from our deep model. Real experiments on the publicly available Reference Energy Disaggregation Dataset (REDD) show significant improvement compared to the state-of-the-art methodologies in terms of the disaggregation accuracy and F-score metrics.
SYSep 20, 2018
A Survey on State Estimation Techniques and Challenges in Smart Distribution SystemsKaveh Dehghanpour, Zhaoyu Wang, Jianhui Wang et al.
This paper presents a review of the literature on State Estimation (SE) in power systems. While covering some works related to SE in transmission systems, the main focus of this paper is Distribution System State Estimation (DSSE). The paper discusses a few critical topics of DSSE, including mathematical problem formulation, application of pseudo-measurements, metering instrument placement, network topology issues, impacts of renewable penetration, and cyber-security. Both conventional and modern data-driven and probabilistic techniques have been reviewed. This paper can provide researchers and utility engineers with insights into the technical achievements, barriers, and future research directions of DSSE.
LGFeb 13, 2017
A Multi-model Combination Approach for Probabilistic Wind Power ForecastingYou Lin, Ming Yang, Can Wan et al.
Short-term probabilistic wind power forecasting can provide critical quantified uncertainty information of wind generation for power system operation and control. As the complicated characteristics of wind power prediction error, it would be difficult to develop a universal forecasting model dominating over other alternative models. Therefore, a novel multi-model combination (MMC) approach for short-term probabilistic wind generation forecasting is proposed in this paper to exploit the advantages of different forecasting models. The proposed approach can combine different forecasting models those provide different kinds of probability density functions to improve the probabilistic forecast accuracy. Three probabilistic forecasting models based on the sparse Bayesian learning, kernel density estimation and beta distribution fitting are used to form the combined model. The parameters of the MMC model are solved based on Bayesian framework. Numerical tests illustrate the effectiveness of the proposed MMC approach.
SYAug 16, 2016
Multi-Period Do-Not-Exceed Limit for Variable Renewable Generation Dispatch Considering Discrete Recourse ControlsZhigang Li, Feng Qiu, Jianhui Wang
The do-not-exceed (DNE) limit method was proposed to accommodate more variable renewable generation (VRG) securely. However, the lack of involving discrete recourse control precludes this method from gaining more flexibility for better VRG integration. This letter formulates a multi-period DNE limit model considering continuous and discrete recourse controls. This model belongs to two-stage robust optimization with mixed integer recourse. A nested column-and-constraint generation approach is employed to solve this model. Case studies show the effectiveness of the proposed method.
SYAug 1, 2016
Nonlinear Model Reduction in Power Systems by Balancing of Empirical Controllability and Observability CovariancesJunjian Qi, Jianhui Wang, Hui Liu et al.
In this paper, nonlinear model reduction for power systems is performed by the balancing of empirical controllability and observability covariances that are calculated around the operating region. Unlike existing model reduction methods, the external system does not need to be linearized but is directly dealt with as a nonlinear system. A transformation is found to balance the controllability and observability covariances in order to determine which states have the greatest contribution to the input-output behavior. The original system model is then reduced by Galerkin projection based on this transformation. The proposed method is tested and validated on a system comprised of a 16-machine 68-bus system and an IEEE 50-machine 145-bus system. The results show that by using the proposed model reduction the calculation efficiency can be greatly improved; at the same time, the obtained state trajectories are close to those for directly simulating the whole system or partitioning the system while not performing reduction. Compared with the balanced truncation method based on a linearized model, the proposed nonlinear model reduction method can guarantee higher accuracy and similar calculation efficiency. It is shown that the proposed method is not sensitive to the choice of the matrices for calculating the empirical covariances.
SYAug 28, 2015
Risk Mitigation for Dynamic State Estimation Against Cyber Attacks and Unknown InputsAhmad F. Taha, Junjian Qi, Jianhui Wang et al.
Phasor measurement units (PMUs) can be effectively utilized for the monitoring and control of the power grid. As the cyber-world becomes increasingly embedded into power grids, the risks of this inevitable evolution become serious. In this paper, we present a risk mitigation strategy, based on dynamic state estimation, to eliminate threat levels from the grid's unknown inputs and potential cyber-attacks. The strategy requires (a) the potentially incomplete knowledge of power system models and parameters and (b) real-time PMU measurements. First, we utilize a dynamic state estimator for higher order depictions of power system dynamics for simultaneous state and unknown inputs estimation. Second, estimates of cyber-attacks are obtained through an attack detection algorithm. Third, the estimation and detection components are seamlessly utilized in an optimization framework to determine the most impacted PMU measurements. Finally, a risk mitigation strategy is proposed to guarantee the elimination of threats from attacks, ensuring the observability of the power system through available, safe measurements. Case studies are included to validate the proposed approach. Insightful suggestions, extensions, and open problems are also posed.