Chao Duan

CL
3papers
85citations
Novelty53%
AI Score48

3 Papers

SYApr 8
Knowledge-data fusion framework for frequency security assessment in low-inertia power systems

Yurun Zhang, Wei Yao, Yutian Lan et al.

The integration of renewable energy via power electronics is transforming power grids into low-inertia systems, heightening the risks of frequency insecurity and widespread outages. Therefore, frequency security assessment (FSA) methods are urgently needed to ensure the reliable system operation. Recently, knowledge-data fusion models attempt to address the limitations of knowledge-driven (accuracy) and data-driven (generalization) FSA methods. However, current methods remain confined to shallow knowledge-data integration due to challenges in representing heterogeneous knowledge and establishing interactive mechanisms. Here, by classifing FSA domain knowledge into physics-guided and physics-constrained categories, we propose a guided learning-constrained network (GL-CN) framework, which deeply integrates domain knowledge across both network architecture and training process. In this framework, a data-driven model with dual input channels combining graph convolutional networks (GCN) and multilayer perceptrons (MLP) is proposed to extract both nodal and system-level power system features. Furthermore, guided learning enhances model generalization through data augmentation in pre-training utilizing physics-guided knowledge, while constrained network encodes physics-constrained knowledge into the network architecture and loss function to ensure physics-consistent and robust predictions. Validated on Yunnan Provincial Power Grid in China, our method reduces FSA time from days to seconds compared to traditional simulation, achieving 98% accuracy, robustness against 39.0% knowledge error, and generalization for 40%-60% renewable penetration. This provides a solid solution for mitigating blackouts caused by frequency insecurity and offers a generalizable paradigm for broader cross-domain problems.

CLApr 19Code
Beyond Overlap Metrics: Rewarding Reasoning and Preferences for Faithful Multi-Role Dialogue Summarization

Xiaoyong Mei, Tingting Zuo, Da Chen et al.

Multi-role dialogue summarization requires modeling complex interactions among multiple speakers while preserving role-specific information and factual consistency. However, most existing methods optimize for automatic metrics such as ROUGE and BERTScore, which favor surface-level imitation of references rather than genuine gains in faithfulness or alignment with human preferences. We propose a novel framework that couples explicit cognitive-style reasoning with reward-based optimization for multi-role dialogue summarization. Our method first distills structured reasoning traces (e.g., step-by-step inferences and intermediate reflections) from a large teacher model and uses them as auxiliary supervision to initialize a reasoning-aware summarizer via staged supervised fine-tuning. It then applies GRPO with a dual-principle reward that blends metric-based signals with human-aligned criteria targeting key information coverage, implicit inference, factual faithfulness, and conciseness. Experiments on multilingual multi-role dialogue benchmarks show that our method matches strong baselines on ROUGE and BERTScore. Specifically, results on CSDS confirm the framework's stability in semantic consistency, while in-depth analysis on SAMSum demonstrates clear gains in factual faithfulness and model-based preference alignment. These findings underscore the value of reasoning-aware and preference-aware training for reliable dialogue summarization. Checkpoints and datasets are available at https://huggingface.co/collections/NebulaPixel/summorchestra-multirole-summary.

CVOct 30, 2021
Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis

Haozhe Wu, Jia Jia, Haoyu Wang et al.

People talk with diversified styles. For one piece of speech, different talking styles exhibit significant differences in the facial and head pose movements. For example, the "excited" style usually talks with the mouth wide open, while the "solemn" style is more standardized and seldomly exhibits exaggerated motions. Due to such huge differences between different styles, it is necessary to incorporate the talking style into audio-driven talking face synthesis framework. In this paper, we propose to inject style into the talking face synthesis framework through imitating arbitrary talking style of the particular reference video. Specifically, we systematically investigate talking styles with our collected \textit{Ted-HD} dataset and construct style codes as several statistics of 3D morphable model~(3DMM) parameters. Afterwards, we devise a latent-style-fusion~(LSF) model to synthesize stylized talking faces by imitating talking styles from the style codes. We emphasize the following novel characteristics of our framework: (1) It doesn't require any annotation of the style, the talking style is learned in an unsupervised manner from talking videos in the wild. (2) It can imitate arbitrary styles from arbitrary videos, and the style codes can also be interpolated to generate new styles. Extensive experiments demonstrate that the proposed framework has the ability to synthesize more natural and expressive talking styles compared with baseline methods.