Kai Sun

CL
h-index40
109papers
13,725citations
Novelty42%
AI Score59

109 Papers

SYAug 23, 2014
An Interaction Model for Simulation and Mitigation of Cascading Failures

Junjian Qi, Kai Sun, Shengwei Mei

In this paper the interactions between component failures are quantified and the interaction matrix and interaction network are obtained. The quantified interactions can capture the general propagation patterns of the cascades from utilities or simulation, thus helping to better understand how cascading failures propagate and to identify key links and key components that are crucial for cascading failure propagation. By utilizing these interactions a high-level probabilistic model called interaction model is proposed to study the influence of interactions on cascading failure risk and to support online decision-making. It is much more time efficient to first quantify the interactions between component failures with fewer original cascades from a more detailed cascading failure model and then perform the interaction model simulation than it is to directly simulate a large number of cascades with a more detailed model. Interaction-based mitigation measures are suggested to mitigate cascading failure risk by weakening key links, which can be achieved in real systems by wide area protection such as blocking of some specific protective relays. The proposed interaction quantifying method and interaction model are validated with line outage data generated by the AC OPA cascading simulations on the IEEE 118-bus system.

SYOct 12, 2016
Risk Assessment of Multi-timescale Cascading Outages based on Markovian Tree Search

Rui Yao, Shaowei Huang, Kai Sun et al.

In the risk assessment of cascading outages, the rationality of simulation and efficiency of computation are both of great significance. To overcome the drawback of sampling-based methods that huge computation resources are required and the shortcoming of initial contingency selection practices that the dependencies in sequences of outages are omitted, this paper proposes a novel risk assessment approach by searching on Markovian Tree. The Markovian tree model is reformulated from the quasi-dynamic multi-timescale simulation model proposed recently to ensure reasonable modeling and simulation of cascading outages. Then a tree search scheme is established to avoid duplicated simulations on same cascade paths, significantly saving computation time. To accelerate the convergence of risk assessment, a risk estimation index is proposed to guide the search for states with major contributions to the risk, and the risk assessment is realized based on the risk estimation index with a forward tree search and backward update algorithm. The effectiveness of the proposed method is illustrated on a 4-node power system, and its convergence profile as well as efficiency is demonstrated on the RTS-96 test system.

SYNov 26, 2018
Optimization of Battery Energy Storage to Improve Power System Oscillation Damping

Yongli Zhu, Chengxi Liu, Kai Sun et al.

A placement problem for multiple Battery Energy Storage System (BESS) units is formulated towards power system transient voltage stability enhancement in this paper. The problem is solved by the Cross-Entropy (CE) optimization method. A simulation-based approach is adopted to incorporate higher-order dynamics and nonlinearities of generators and loads. The objective is to maximize the voltage stability index, which is setup based on certain grid-codes. Formulations of the optimization problem are then discussed. Finally, the proposed approach is implemented in MATLAB/DIgSILENT and tested on the New England 39-Bus system. Results indicate that installing BESS units at the optimized location can alleviate transient voltage instability issue compared with the original system with no BESS. The CE placement algorithm is also compared with the classic PSO (Particle Swarm Optimization) method, and its superiority is demonstrated in terms of a faster convergence rate with matched solution qualities.

CLDec 25, 2025Code
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables

Zhaojiang Lin, Yong Xu, Kai Sun et al.

Wearable devices such as AI glasses are transforming voice assistants into always-available, hands-free collaborators that integrate seamlessly with daily life, but they also introduce challenges like egocentric audio affected by motion and noise, rapid micro-interactions, and the need to distinguish device-directed speech from background conversations. Existing benchmarks largely overlook these complexities, focusing instead on clean or generic conversational audio. To bridge this gap, we present WearVox, the first benchmark designed to rigorously evaluate voice assistants in realistic wearable scenarios. WearVox comprises 3,842 multi-channel, egocentric audio recordings collected via AI glasses across five diverse tasks including Search-Grounded QA, Closed-Book QA, Side-Talk Rejection, Tool Calling, and Speech Translation, spanning a wide range of indoor and outdoor environments and acoustic conditions. Each recording is accompanied by rich metadata, enabling nuanced analysis of model performance under real-world constraints. We benchmark leading proprietary and open-source speech Large Language Models (SLLMs) and find that most real-time SLLMs achieve accuracies on WearVox ranging from 29% to 59%, with substantial performance degradation on noisy outdoor audio, underscoring the difficulty and realism of the benchmark. Additionally, we conduct a case study with two new SLLMs that perform inference with single-channel and multi-channel audio, demonstrating that multi-channel audio inputs significantly enhance model robustness to environmental noise and improve discrimination between device-directed and background speech. Our results highlight the critical importance of spatial audio cues for context-aware voice assistants and establish WearVox as a comprehensive testbed for advancing wearable voice AI research.

SYNov 7, 2017
Adaptive Nonlinear Model Reduction for Fast Power System Simulation

Denis Osipov, Kai Sun

The paper proposes a new adaptive approach to power system model reduction for fast and accurate time-domain simulation. This new approach is a compromise between linear model reduction for faster simulation and nonlinear model reduction for better accuracy. During the simulation period, the approach adaptively switches among detailed and linearly or nonlinearly reduced models based on variations of the system state: it employs unreduced models for the fault-on period, uses weighted column norms of the admittance matrix to decide which functions to be linearized in power system differential-algebraic equations for large changes of the state, and adopts a linearly reduced model for small changes of the state. Two versions of the adaptive model reduction approach are introduced. The first version uses traditional power system partitioning where the model reduction is applied to a defined large external area in a power system and the other area defined as the study area keeps full detailed models. The second version applies the adaptive model reduction to the whole system. The paper also conducts comprehensive case studies comparing simulation results using the proposed adaptively reduced models with the linearly reduced model on the Northeast Power Coordinating Council 140-bus 48-machine system.

SYSep 18, 2017
Approximate Analytical Solutions of Power Flow Equations Based on Multi-Dimensional Holomorphic Embedding Method

Chengxi Liu, Bin Wang, Xin Xu et al.

It is well known that closed-form analytical solutions for AC power flow equations do not exist in general. This paper proposes a multi-dimensional holomorphic embedding method (MDHEM) to obtain an explicit approximate analytical AC power-flow solution by finding a physical germ solution and arbitrarily embedding each power, each load or groups of loads with respective scales. Based on the MDHEM, the complete approximate analytical solutions to the power flow equations in the high-dimensional space become achievable, since the voltage vector of each bus can be explicitly expressed by a convergent multivariate power series of all the loads. Unlike the traditional iterative methods for power flow calculation and inaccurate sensitivity analysis method for voltage control, the algebraic variables of a power system in all operating conditions can be prepared offline and evaluated online by only plugging in the values of any operating conditions into the scales of the non-linear multivariate power series. Case studies implemented on the 4-bus test system and the IEEE 14-bus standard system confirm the effectiveness of the proposed method.

SYApr 19, 2018
A Time-Power Series Based Semi-Analytical Approach for Power System Simulation

Bin Wang, Nan Duan, Kai Sun

Time domain simulation is the basis of dynamic security assessment for power systems. Traditionally, numerical integration methods are adopted by simulation software to solve nonlinear power system differential-algebraic equations about any given contingency under a specific operating condition. An alternative approach promising for online simulation is to offline derive a semi-analytical solution (SAS) and then online evaluate the SAS over consecutive time windows regarding the operating condition and contingency until obtaining the simulation result over a desired period. This paper proposes a general semi-analytical approach that derives and evaluates an SAS in the form of power series in time to approximate the solutions of power system differential equations. An error-rate upper bound of the SAS is also proposed to guarantee the reliable use of adaptive time windows for evaluation of the SAS. A dynamic bus method is proposed to extend the semi-analytical approach for solving general power system DAEs by efficiently linking the SASs for dynamic components through the numerical solution of the network algebraic equations. Case studies performed on the New England 39-bus system and the Polish 2383-bus system test the performance of the proposed semi-analytical approach and compare to existing methods. The results show that the SAS based approach has potentials for online simulations.

SYMay 4, 2017
Multi-Stage Holomorphic Embedding Method for Calculating the Power-Voltage Curve

Bin Wang, Chengxi Liu, Kai Sun

The recently proposed non-iterative load flow method, called the holomorphic embedding method, may encounter the precision issue, i.e. nontrivial round-off errors caused by the limit of digits used in computation when calculating the power-voltage (P-V) curve for a heavily loaded power system. This letter proposes a multi-stage scheme to solve such a precision issue and calculate an accurate P-V curve. The scheme is verified on the New Eng-land 39-bus power system and benchmarked with the result from the traditional continuation power flow method.

SYMay 20, 2021
Power System Differential-Algebraic Equations

Bin Wang, Yang Liu, Kai Sun

This document presents an introduction of two commonly used power system differential algebraic equations for studying electromechanical oscillation and transient stability. Two types of generator models are used to formulate the power system model, respectively: the second-order classical model and the fourth-order generator model. An example is provided on the IEEE 9-bus system.

CLAug 20, 2023
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?

Kai Sun, Yifan Ethan Xu, Hanwen Zha et al.

Since the recent prosperity of Large Language Models (LLMs), there have been interleaved discussions regarding how to reduce hallucinations from LLM responses, how to increase the factuality of LLMs, and whether Knowledge Graphs (KGs), which store the world knowledge in a symbolic form, will be replaced with LLMs. In this paper, we try to answer these questions from a new angle: How knowledgeable are LLMs? To answer this question, we constructed Head-to-Tail, a benchmark that consists of 18K question-answer (QA) pairs regarding head, torso, and tail facts in terms of popularity. We designed an automated evaluation method and a set of metrics that closely approximate the knowledge an LLM confidently internalizes. Through a comprehensive evaluation of 16 publicly available LLMs, we show that existing LLMs are still far from being perfect in terms of their grasp of factual knowledge, especially for facts of torso-to-tail entities.

SYNov 9, 2018
Nonlinear Modal Decoupling Based Power System Transient Stability Analysis

Bin Wang, Kai Sun, Xin Xu

Nonlinear modal decoupling (NMD) was recently proposed to nonlinearly transform a multi-oscillator system into a number of decoupled oscillators which together behave the same as the original system in an extended neighborhood of the equilibrium. Each oscillator has just one degree of freedom and hence can easily be analyzed to infer the stability of the original system associated with one electromechanical mode. As the first attempt of applying the NMD methodology to realistic power system models, this paper proposes an NMD-based transient stability analysis approach. For a multi-machine power system, the approach first derives decoupled nonlinear oscillators by a coordinates transformation, and then applies Lyapunov stability analysis to oscillators to assess the stability of the original system. Nonlinear modal interaction is also considered. The approach can be efficiently applied to a large-scale power grid by conducting NMD regarding only selected modes. Case studies on a 3-machine 9-bus system and an NPCC 48-machine 140-bus system show the potentials of the approach in transient stability analysis for multi-machine systems.

DSFeb 8, 2017
Finding Semi-Analytic Solutions of Power System Differential-Algebraic Equations for Fast Transient Stability Simulation

Nan Duan, Kai Sun

This paper studies the semi-analytic solution (SAS) of a power system's differential-algebraic equation. A SAS is a closed-form function of symbolic variables including time, the initial state and the parameters on system operating conditions, and hence able to directly give trajectories on system state variables, which are accurate for at least a certain time window. A two-stage SAS-based approach for fast transient stability simulation is proposed, which offline derives the SAS by the Adomian Decomposition Method and online evaluates the SAS for each of sequential time windows until making up a desired simulation period. When applied to fault simulation, the new approach employs numerical integration only for the fault-on period to determine the post-disturbance initial state of the SAS. The paper further analyzes the maximum length of a time window for a SAS to keep its accuracy, and accordingly, introduces a divergence indicator for adaptive time windows. The proposed SAS-based new approach is validated on the IEEE 10-machine, 39-bus system.

SYMar 26, 2021
Stochastic Power System Simulation Using the Adomian Decomposition Method

Nan Duan, Kai Sun

Considering increasing distributed energy resources and responsive loads in smart grid, this paper proposes a stochastic simulation approach for stability analysis of a power system having stochastic loads. The proposed approach solves a stochastic, nonlinear differential equation model of the system in an analytical way by the Adomian decomposition method and generates semi-analytical solutions that express both deterministic and stochastic state variables explicitly as symbolic variables so as to embed stochastic processes directly into the solutions for efficient stability analysis with uncertainties. The proposed approach is tested on the New England 10-machine 39-bus system with different penetration levels of stochastic loads. The approach is also benchmarked with a traditional stochastic simulation approach based on the Euler-Maruyama method. The results show that the new approach has better time performance and a comparable accuracy.

SYNov 6, 2017
Management of Cascading Outage Risk Based on Risk Gradient and Markovian Tree Search

Rui Yao, Kai Sun, Feng Liu et al.

Since cascading outages are major threats to power systems, it is important to reduce the risk of potential cascading outages. In this paper, a risk management method of cascading outages based on Markovian tree search is proposed. With the tree expansion on the cascading outage risk, risk gradient is computed efficiently by a forward-backward tree search scheme with good convergence, and it is then employed in an optimization model to minimize control cost while effectively reducing the cascading outage risk. To overcome the limitation with linearization in computing risk gradient, an iterative risk management (IRM) approach is further developed. Tests on the RTS-96 3-area system verify the accuracy of the computed risk gradient and its effectiveness for risk reduction. Time performance of the proposed IRM approach is tested on the RTS-96 system, a 410-bus US-Canada northeast system and a 1354-bus Mid-European system, and demonstrates its potentials for decision support on practical power systems online or on hourly basis.

CLJul 16, 2023
The Potential and Pitfalls of using a Large Language Model such as ChatGPT or GPT-4 as a Clinical Assistant

Jingqing Zhang, Kai Sun, Akshay Jagadeesh et al.

Recent studies have demonstrated promising performance of ChatGPT and GPT-4 on several medical domain tasks. However, none have assessed its performance using a large-scale real-world electronic health record database, nor have evaluated its utility in providing clinical diagnostic assistance for patients across a full range of disease presentation. We performed two analyses using ChatGPT and GPT-4, one to identify patients with specific medical diagnoses using a real-world large electronic health record database and the other, in providing diagnostic assistance to healthcare workers in the prospective evaluation of hypothetical patients. Our results show that GPT-4 across disease classification tasks with chain of thought and few-shot prompting can achieve performance as high as 96% F1 scores. For patient assessment, GPT-4 can accurately diagnose three out of four times. However, there were mentions of factually incorrect statements, overlooking crucial medical findings, recommendations for unnecessary investigations and overtreatment. These issues coupled with privacy concerns, make these models currently inadequate for real world clinical use. However, limited data and time needed for prompt engineering in comparison to configuration of conventional machine learning workflows highlight their potential for scalability across healthcare applications.

SYNov 1, 2018
Power System Transient Stability Analysis Using Truncated Taylor Expansion Systems

Bin Wang, Xin Xu, Kai Sun

Small signal analysis is a special case of analytical approaches using Taylor expansions of power system differential equations with the truncation performed at order one. The truncated Taylor expansions (TTEs) at higher orders can lead to better approaches for stability analysis by considering higher order nonlinearities, e.g. normal form, modal series and nonlinear modal decoupling. This paper presents fundamental studies on how accurate transient stability analysis results can be obtained from the TTE systems compared to that on the original system. The analytical investigation is conducted on single-machine-infinite-bus power systems. Conclusions are drawn from there and verified on two multi-machine power systems by extensive numerical simulations.

SYMar 9, 2022
Machine Learning based Optimal Feedback Control for Microgrid Stabilization

Tianwei Xia, Kai Sun, Wei Kang

Microgrids have more operational flexibilities as well as uncertainties than conventional power grids, especially when renewable energy resources are utilized. An energy storage based feedback controller can compensate undesired dynamics of a microgrid to improve its stability. However, the optimal feedback control of a microgrid subject to a large disturbance needs to solve a Hamilton-Jacobi-Bellman problem. This paper proposes a machine learning-based optimal feedback control scheme. Its training dataset is generated from a linear-quadratic regulator and a brute-force method respectively addressing small and large disturbances. Then, a three-layer neural network is constructed from the data for the purpose of optimal feedback control. A case study is carried out for a microgrid model based on a modified Kundur two-area system to test the real-time performance of the proposed control scheme.

DSFeb 12, 2018
Power System Simulation Using the Differential Transformation Method

Yang Liu, Kai Sun

This paper proposes a new semi-analytical approach for online time-domain power system simulation. The approach applies the differential transformation method (DTM) to the power system differential equation model to offline derive a semi-analytical solution (SAS) having symbolic variables about time, the initial state and system conditions. When simulation is online needed for a contingency under the current system condition, the SAS can be evaluated in real time to generate simulation results. Compared to the Adomian decomposition method in obtaining a power system SAS, an SAS derived by the DTM adopts a recursive form to avoid generating and storing its complete symbolic expression, which makes both derivation and evaluation of the SAS more efficient especially for multi-machine power systems. The optimal order of a DTM-based SAS is studied for the best time performance of simulation. The paper also designs a parallel computing strategy for power system simulation using the DTM-based SAS. Tests on the IEEE 10-machine 39-bus system demonstrate significant speedup of simulation using the proposed approach compared with the Runge-Kutta method.

SYNov 8, 2016
Emulated Inertia and Damping of Converter-Interfaced Power Source

Bin Wang, Yichen Zhang, Kai Sun et al.

Converter-interfaced power sources (CIPSs), like wind turbine and energy storage, can be switched to the inertia emulation mode when the detected frequency deviation exceeds a pre-designed threshold, i.e. dead band, to support the frequency response of a power grid. This letter proposes an approach to derive the emulated inertia and damping from a CIPS based on the linearized model of the CIPS and the power grid, where the grid is represented by an equivalent single machine. The emulated inertia and damping can be explicitly expressed in time and turn out to be time-dependent.

89.9NEMar 15Code
MorphSNN: Adaptive Graph Diffusion and Structural Plasticity for Spiking Neural Networks

Yongsheng Huang, Peibo Duan, Yujie Wu et al.

Spiking Neural Networks (SNNs) currently face a critical bottleneck: while individual neurons exhibit dynamic biological properties, their macro-scopic architectures remain confined within conventional connectivity patterns that are static and hierarchical. This discrepancy between neuron-level dynamics and network-level fixed connectivity eliminates critical brain-like lateral interactions, limiting adaptability in changing environments. To address this, we propose MorphSNN, a backbone framework inspired by biological non-synaptic diffusion and structural plasticity. Specifically, we introduce a Graph Diffusion (GD)mechanism to facilitate efficient undirected signal propagation, complementing the feedforward hierarchy. Furthermore, it incorporates a Spatio-Temporal Structural Plasticity (STSP) mechanism, endowing the network with the capability for instance-specific, dynamic topological reorganization, thereby overcoming the limitations of fixed topologies. Experiments demonstrate that MorphSNN achieves state-of-the-art accuracy on static and neuromorphic datasets; for instance, it reaches 83.35% accuracy on N-Caltech101 with only 5 timesteps. More importantly, its self-evolving topology functions as an intrinsic distribution fingerprint, enabling superior Out-of- Distribution (OOD) detection without auxiliary training. The code is available at anonymous.4open.science/r/MorphSNN-B0BC.

SYMar 8, 2015
An Analytical Formulation of Power System Oscillation Frequency

Bin Wang, Kai Sun

This letter proposes an analytical approach to formulate the power system oscillation frequency under a large disturbance. A fact is revealed that the oscillation frequency is only the function of the oscillation amplitude when the system's model and operating condition are fixed. Case studies also show that this function is damping-insensitive and could be applied to an inter-area model of a multi-machine power system.

59.0LGMay 14Code
Not All Timesteps Matter Equally: Selective Alignment Knowledge Distillation for Spiking Neural Networks

Kai Sun, Peibo Duan, Yongsheng Huang et al.

Spiking neural networks (SNNs), which are brain-inspired and spike-driven, achieve high energy efficiency. However, a performance gap between SNNs and artificial neural networks (ANNs) still remains. Knowledge distillation (KD) is commonly adopted to improve SNN performance, but existing methods typically enforce uniform alignment across all timesteps, either from a teacher network or through inter-temporal self-distillation, implicitly assuming that per-timestep predictions should be treated equally. In practice, SNN predictions vary and evolve over time, and intermediate timesteps need not all be individually correct even when the final aggregated output is correct. Under such conditions, effective distillation should not force every timestep toward the same supervision target, but instead provide corrective guidance to erroneous timesteps while preserving useful temporal dynamics. To address this issue, we propose Selective Alignment Knowledge Distillation (SeAl-KD), which selectively aligns class-level and temporal knowledge by equalizing competing logits at erroneous timesteps and reweighting temporal alignment based on confidence and inter-timestep similarity. Extensive experiments on static image and neuromorphic event-based datasets demonstrate consistent improvements over existing distillation methods. The code is available at https://github.com/KaiSUN1/SeAl

CLFeb 28, 2023
Self-training through Classifier Disagreement for Cross-Domain Opinion Target Extraction

Kai Sun, Richong Zhang, Samuel Mensah et al.

Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining that aims to extract the targets (or aspects) on which opinions have been expressed. Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios, where the testing and training distributions differ. Most methods use domain adversarial neural networks that aim to reduce the domain gap between the labelled source and unlabelled target domains to improve target domain performance. However, this approach only aligns feature distributions and does not account for class-wise feature alignment, leading to suboptimal results. Semi-supervised learning (SSL) has been explored as a solution, but is limited by the quality of pseudo-labels generated by the model. Inspired by the theoretical foundations in domain adaptation [2], we propose a new SSL approach that opts for selecting target samples whose model output from a domain-specific teacher and student network disagree on the unlabelled target data, in an effort to boost the target domain performance. Extensive experiments on benchmark cross-domain OTE datasets show that this approach is effective and performs consistently well in settings with large domain shifts.

CLDec 9, 2025
An Agentic AI System for Multi-Framework Communication Coding

Bohao Yang, Rui Yang, Joshua M. Biro et al.

Clinical communication is central to patient outcomes, yet large-scale human annotation of patient-provider conversation remains labor-intensive, inconsistent, and difficult to scale. Existing approaches based on large language models typically rely on single-task models that lack adaptability, interpretability, and reliability, especially when applied across various communication frameworks and clinical domains. In this study, we developed a Multi-framework Structured Agentic AI system for Clinical Communication (MOSAIC), built on a LangGraph-based architecture that orchestrates four core agents, including a Plan Agent for codebook selection and workflow planning, an Update Agent for maintaining up-to-date retrieval databases, a set of Annotation Agents that applies codebook-guided retrieval-augmented generation (RAG) with dynamic few-shot prompting, and a Verification Agent that provides consistency checks and feedback. To evaluate performance, we compared MOSAIC outputs against gold-standard annotations created by trained human coders. We developed and evaluated MOSAIC using 26 gold standard annotated transcripts for training and 50 transcripts for testing, spanning rheumatology and OB/GYN domains. On the test set, MOSAIC achieved an overall F1 score of 0.928. Performance was highest in the Rheumatology subset (F1 = 0.962) and strongest for Patient Behavior (e.g., patients asking questions, expressing preferences, or showing assertiveness). Ablations revealed that MOSAIC outperforms baseline benchmarking.

CLOct 28, 2023
Anaphor Assisted Document-Level Relation Extraction

Chonggang Lu, Richong Zhang, Kai Sun et al.

Document-level relation extraction (DocRE) involves identifying relations between entities distributed in multiple sentences within a document. Existing methods focus on building a heterogeneous document graph to model the internal structure of an entity and the external interaction between entities. However, there are two drawbacks in existing methods. On one hand, anaphor plays an important role in reasoning to identify relations between entities but is ignored by these methods. On the other hand, these methods achieve cross-sentence entity interactions implicitly by utilizing a document or sentences as intermediate nodes. Such an approach has difficulties in learning fine-grained interactions between entities across different sentences, resulting in sub-optimal performance. To address these issues, we propose an Anaphor-Assisted (AA) framework for DocRE tasks. Experimental results on the widely-used datasets demonstrate that our model achieves a new state-of-the-art performance.

97.0CVApr 16
AnimationBench: Are Video Models Good at Character-Centric Animation?

Leyi Wu, Pengjun Fang, Kai Sun et al.

Video generation has advanced rapidly, with recent methods producing increasingly convincing animated results. However, existing benchmarks-largely designed for realistic videos-struggle to evaluate animation-style generation with its stylized appearance, exaggerated motion, and character-centric consistency. Moreover, they also rely on fixed prompt sets and rigid pipelines, offering limited flexibility for open-domain content and custom evaluation needs. To address this gap, we introduce AnimationBench, the first systematic benchmark for evaluating animation image-to-video generation. AnimationBench operationalizes the Twelve Basic Principles of Animation and IP Preservation into measurable evaluation dimensions, together with Broader Quality Dimensions including semantic consistency, motion rationality, and camera motion consistency. The benchmark supports both a standardized close-set evaluation for reproducible comparison and a flexible open-set evaluation for diagnostic analysis, and leverages visual-language models for scalable assessment. Extensive experiments show that AnimationBench aligns well with human judgment and exposes animation-specific quality differences overlooked by realism-oriented benchmarks, leading to more informative and discriminative evaluation of state-of-the-art I2V models.

CLFeb 25, 2025Code
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble

Zhijun Chen, Jingzheng Li, Pengpeng Chen et al.

LLM Ensemble -- which involves the comprehensive use of multiple large language models (LLMs), each aimed at handling user queries during downstream inference, to benefit from their individual strengths -- has gained substantial attention recently. The widespread availability of LLMs, coupled with their varying strengths and out-of-the-box usability, has profoundly advanced the field of LLM Ensemble. This paper presents the first systematic review of recent developments in LLM Ensemble. First, we introduce our taxonomy of LLM Ensemble and discuss several related research problems. Then, we provide a more in-depth classification of the methods under the broad categories of "ensemble-before-inference, ensemble-during-inference, ensemble-after-inference'', and review all relevant methods. Finally, we introduce related benchmarks and applications, summarize existing studies, and suggest several future research directions. A curated list of papers on LLM Ensemble is available at https://github.com/junchenzhi/Awesome-LLM-Ensemble.

72.7DCMar 13
NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL

Amos Goldman, Nimrod Boker, Maayan Sheraizin et al.

Mixture-of-Experts (MoE) architectures have become essential for scaling large language models, driving the development of specialized device-initiated communication libraries such as DeepEP, Hybrid-EP, and others. These libraries demonstrate the performance benefits of GPU-initiated RDMA for MoE dispatch and combine operations. This paper presents NCCL EP (Expert Parallelism), a ground-up MoE communication library built entirely on NCCL's Device API. NCCL EP provides unified ncclEpDispatch and ncclEpCombine primitives with both C and Python interfaces, supporting Low-Latency (LL) mode for inference decoding and High-Throughput (HT) mode for training and inference prefill. LL targets small batch sizes (1-128 tokens) using direct all-to-all RDMA+NVLink mesh connectivity with double-buffered communication for overlapping dispatch and combine phases. HT targets large batches (4096+ tokens) using hierarchical communication that aggregates tokens within NVLink domains before inter-node RDMA transmission. Both modes leverage Device API for both intra- and inter-node communications, taking advantage of its topology awareness and optimized GPU-initiated implementation. We evaluate NCCL EP on an H100-based cluster across multi-node configurations, demonstrating competitive LL kernel performance and presenting end-to-end results with vLLM integration. By building MoE communication natively within NCCL, NCCL EP provides a supported path for expert parallelism on current and emerging NVIDIA platforms.

CVOct 30, 2025
CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark

Jiaqi Wang, Xiao Yang, Kai Sun et al.

Wearable devices such as smart glasses are transforming the way people interact with their surroundings, enabling users to seek information regarding entities in their view. Multi-Modal Retrieval-Augmented Generation (MM-RAG) plays a key role in supporting such questions, yet there is still no comprehensive benchmark for this task, especially regarding wearables scenarios. To fill this gap, we present CRAG-MM -- a Comprehensive RAG benchmark for Multi-modal Multi-turn conversations. CRAG-MM contains a diverse set of 6.5K (image, question, answer) triplets and 2K visual-based multi-turn conversations across 13 domains, including 6.2K egocentric images designed to mimic captures from wearable devices. We carefully constructed the questions to reflect real-world scenarios and challenges, including five types of image-quality issues, six question types, varying entity popularity, differing information dynamism, and different conversation turns. We design three tasks: single-source augmentation, multi-source augmentation, and multi-turn conversations -- each paired with an associated retrieval corpus and APIs for both image-KG retrieval and webpage retrieval. Our evaluation shows that straightforward RAG approaches achieve only 32% and 43% truthfulness on CRAG-MM single- and multi-turn QA, respectively, whereas state-of-the-art industry solutions have similar quality (32%/45%), underscoring ample room for improvement. The benchmark has hosted KDD Cup 2025, attracting about 1K participants and 5K submissions, with winning solutions improving baseline performance by 28%, highlighting its early impact on advancing the field.

IROct 6, 2022
Digital Asset Valuation: A Study on Domain Names, Email Addresses, and NFTs

Kai Sun

Existing works on valuing digital assets on the Internet typically focus on a single asset class. To promote the development of automated valuation techniques, preferably those that are generally applicable to multiple asset classes, we construct DASH, the first Digital Asset Sales History dataset that features multiple digital asset classes spanning from classical to blockchain-based ones. Consisting of 280K transactions of domain names (DASH_DN), email addresses (DASH_EA), and non-fungible token (NFT)-based identifiers (DASH_NFT), such as Ethereum Name Service names, DASH advances the field in several aspects: the subsets DASH_DN, DASH_EA, and DASH_NFT are the largest freely accessible domain name transaction dataset, the only publicly available email address transaction dataset, and the first NFT transaction dataset that focuses on identifiers, respectively. We build strong conventional feature-based models as the baselines for DASH. We next explore deep learning models based on fine-tuning pre-trained language models, which have not yet been explored for digital asset valuation in the previous literature. We find that the vanilla fine-tuned model already performs reasonably well, outperforming all but the best-performing baselines. We further propose improvements to make the model more aware of the time sensitivity of transactions and the popularity of assets. Experimental results show that our improved model consistently outperforms all the other models across all asset classes on DASH.

CYSep 20, 2024
PyGRF: An improved Python Geographical Random Forest model and case studies in public health and natural disasters

Kai Sun, Ryan Zhenqi Zhou, Jiyeon Kim et al.

Geographical random forest (GRF) is a recently developed and spatially explicit machine learning model. With the ability to provide more accurate predictions and local interpretations, GRF has already been used in many studies. The current GRF model, however, has limitations in its determination of the local model weight and bandwidth hyperparameters, potentially insufficient numbers of local training samples, and sometimes high local prediction errors. Also, implemented as an R package, GRF currently does not have a Python version which limits its adoption among machine learning practitioners who prefer Python. This work addresses these limitations by introducing theory-informed hyperparameter determination, local training sample expansion, and spatially-weighted local prediction. We also develop a Python-based GRF model and package, PyGRF, to facilitate the use of the model. We evaluate the performance of PyGRF on an example dataset and further demonstrate its use in two case studies in public health and natural disasters.

41.7CVMay 4Code
RAFNet: Region-Aware Fusion Network for Pansharpening

Jianing Zhang, Zijian Zhou, Kai Sun

Pansharpening aims to generate high-resolution multispectral (HRMS) images by fusing low-resolution multispectral (LRMS) and high-resolution panchromatic (PAN) images. Although deep learning has advanced this field, mainstream frequency-based methods relying on standard scaled dot-product attention suffer from quadratic computational complexity and fail to exploit the inherent regional sparsity of remote sensing imagery. Furthermore, existing spatial enhancement strategies typically employ static convolution kernels, which struggle to adapt to the complex frequency and regional variations of PAN and MS images. To address these bottlenecks, we propose a Region-Aware Fusion (RAFNet) Network that synergistically models spatial and frequency information. Specifically, we design a Spatial Adaptive Refinement (SAR) module that leverages the discrete wavelet transform (DWT) for directional frequency separation and K-means clustering for regional partitioning, which enables the dynamic construction of region-specific adaptive convolution kernels, achieving spatially and frequency-adaptive feature enhancement. Moreover, we introduce a Clustered Frequency Aggregation (CFA) module based on a sparse attention mechanism guided by the semantic clusters, which executes a region-aware sparse attention strategy that drastically reduces computational redundancy while ensuring high-quality frequency feature extraction. In addition we integrated these modules into a progressive, multi-level spatial-frequency network architecture to facilitate robust interaction and accurate image reconstruction. Extensive experiments on multiple benchmark datasets demonstrate that the proposed RAFNet significantly outperforms state-of-the-art pansharpening methods in both reduced- and full-resolution assessments. The code is available at https://github.com/PatrickNod/RAFNet.

CVFeb 26
SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation

Ling Wang, Hao-Xiang Guo, Xinzhou Wang et al.

We introduce SceneTransporter, an end-to-end framework for structured 3D scene generation from a single image. While existing methods generate part-level 3D objects, they often fail to organize these parts into distinct instances in open-world scenes. Through a debiased clustering probe, we reveal a critical insight: this failure stems from the lack of structural constraints within the model's internal assignment mechanism. Based on this finding, we reframe the task of structured 3D scene generation as a global correlation assignment problem. To solve this, SceneTransporter formulates and solves an entropic Optimal Transport (OT) objective within the denoising loop of the compositional DiT model. This formulation imposes two powerful structural constraints. First, the resulting transport plan gates cross-attention to enforce an exclusive, one-to-one routing of image patches to part-level 3D latents, preventing entanglement. Second, the competitive nature of the transport encourages the grouping of similar patches, a process that is further regularized by an edge-based cost, to form coherent objects and prevent fragmentation. Extensive experiments show that SceneTransporter outperforms existing methods on open-world scene generation, significantly improving instance-level coherence and geometric fidelity. Code and models will be publicly available at https://2019epwl.github.io/SceneTransporter/.

61.7ROMay 18
4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving

Kane Qian, Xin Zhao, Yining Shi et al.

We present 4DLidarOpen, a large-scale open multi-modal dataset for autonomous driving, centered on 4D frequency-modulated continuous-wave (FMCW) Lidar sensing. Unlike conventional time-of-flight Lidar datasets that mainly provide geometric measurements, 4DLidarOpen includes point-wise radial velocity measurements from a forward-facing 4D FMCW Lidar, together with multiple Lidars of different types, including rotating, solid-state, and blind-spot variants, surround-view cameras, and 6-DOF ego-vehicle poses. The dataset was collected in complex urban environments in Beijing and covers dense pedestrian interactions, congested traffic, high-speed driving, and unprotected maneuvers. 4DLidarOpen provides synchronized multi-sensor data and 3D bounding-box annotations with persistent track IDs across five object categories. A hybrid annotation strategy is adopted, where large-scale auto-labeled data support scalable training and human experts refine annotations for the human-annotated training and validation sets. Based on this dataset, we establish benchmarks for 3D object detection, birds-eye view (BEV) segmentation and flow prediction, and motion forecasting with planning. Extensive experiments show that direct velocity measurements from 4D FMCW Lidar provide complementary motion cues for dynamic-scene understanding. Compared with geometric-only sensing, the velocity-aware representation improves motion-related perception and downstream forecasting and planning, especially in scenarios involving vulnerable road users and fast-moving objects. These results indicate that 4D FMCW Lidar is a promising sensing modality for motion-aware autonomous driving. The dataset and evaluation toolkit are publicly released to support research on 4D scene understanding, multi-Lidar fusion, and velocity-aware perception and planning.

82.7SYMay 18
A Benchmark on LLM-Based Power Flow Computation: Do More Structured Prompts Help?

Tingwei Chen, Kaiyang Huang, Kai Sun

We present a controlled benchmark evaluating three LLMs -- Claude Sonnet 4.5, Gemini 2.5 Pro, and GPT-3.5 Turbo -- across four prompt formats (from concise narrative to structured JSON with explicit iteration trace) on Gauss--Seidel AC power flow computation for a three-bus system. Against 50 test cases with reference solutions computed numerically, Gemini 2.5 Pro with the simplest narrative prompt achieves the lowest mean absolute error (MAE = 0.257 MW/MVar, 54\% of cases within 5\% relative error), while the same model with a JSON-structured prompt raises MAE to 0.789 -- a 3.1$\times$ increase. Adding a worked example degrades accuracy for Gemini but provides a marginal gain for Claude. GPT-3.5 Turbo fails on at least 90\% of cases under all prompt formats. An independent 100-case replication with related prompt-format families confirms the qualitative ordering (Gemini $>$ Claude $>$ GPT-3.5): the best 100-case configuration (Gemini with explicit iteration trace) achieves MAE = 0.402 and 53\% within 5\%, while Claude Sonnet 4.5's near-flat accuracy profile ($\approx$38\% within 5\% across formats) and GPT-3.5's near total ineffectiveness (92--97\% above 20\% error) both replicate. In neither evaluation does any configuration achieve sufficient reliability for use as a direct numerical solver. These findings offer a diagnostic baseline for practitioners and researchers evaluating LLMs for smart-grid decision-support assistance.

34.9ROApr 20
Periodic Steady-State Control of a Handkerchief-Spinning Task Using a Parallel Anti-Parallelogram Tendon-driven Wrist

Lei Liu, Haonan Zhang, Huahang Xu et al.

Spinning flexible objects, exemplified by traditional Chinese handkerchief performances, demands periodic steady-state motions under nonlinear dynamics with frictional contacts and boundary constraints. To address these challenges, we first design an intuitive dexterous wrist based on a parallel anti-parallelogram tendon-driven structure, which achieves 90 degrees omnidirectional rotation with low inertia and decoupled roll-pitch sensing, and implement a high-low level hierarchical control scheme. We then develop a particle-spring model of the handkerchief for control-oriented abstraction and strategy evaluation. Hardware experiments validate this framework, achieving an unfolding ratio of approximately 99% and fingertip tracking error of RMSE = 2.88 mm in high-dynamic spinning. These results demonstrate that integrating control-oriented modeling with a task-tailored dexterous wrist enables robust rest-to-steady-state transitions and precise periodic manipulation of highly flexible objects. More visualizations: https://slowly1113.github.io/icra2026-handkerchief/

CVNov 3, 2024Code
Polar R-CNN: End-to-End Lane Detection with Fewer Anchors

Shengqi Wang, Junmin Liu, Xiangyong Cao et al.

Lane detection is a critical and challenging task in autonomous driving, particularly in real-world scenarios where traffic lanes can be slender, lengthy, and often obscured by other vehicles, complicating detection efforts. Existing anchor-based methods typically rely on prior lane anchors to extract features and subsequently refine the location and shape of lanes. While these methods achieve high performance, manually setting prior anchors is cumbersome, and ensuring sufficient coverage across diverse datasets often requires a large amount of dense anchors. Furthermore, the use of Non-Maximum Suppression (NMS) to eliminate redundant predictions complicates real-world deployment and may underperform in complex scenarios. In this paper, we propose Polar R-CNN, an end-to-end anchor-based method for lane detection. By incorporating both local and global polar coordinate systems, Polar R-CNN facilitates flexible anchor proposals and significantly reduces the number of anchors required without compromising performance.Additionally, we introduce a triplet head with heuristic structure that supports NMS-free paradigm, enhancing deployment efficiency and performance in scenarios with dense lanes.Our method achieves competitive results on five popular lane detection benchmarks--Tusimple, CULane,LLAMAS, CurveLanes, and DL-Rai--while maintaining a lightweight design and straightforward structure. Our source code is available at https://github.com/ShqWW/PolarRCNN.

IVMay 22, 2022
Preparing data for pathological artificial intelligence with clinical-grade performance

Yuanqing Yang, Kai Sun, Yanhua Gao et al.

[Purpose] The pathology is decisive for disease diagnosis, but relies heavily on the experienced pathologists. Recently, pathological artificial intelligence (PAI) is thought to improve diagnostic accuracy and efficiency. However, the high performance of PAI based on deep learning in the laboratory generally cannot be reproduced in the clinic. [Methods] Because the data preparation is important for PAI, the paper has reviewed PAI-related studies in the PubMed database published from January 2017 to February 2022, and 118 studies were included. The in-depth analysis of methods for preparing data is performed, including obtaining slides of pathological tissue, cleaning, screening, and then digitizing. Expert review, image annotation, dataset division for model training and validation are also discussed. We further discuss the reasons why the high performance of PAI is not reproducible in the clinical practices and show some effective ways to improve clinical performances of PAI. [Results] The robustness of PAI depend on randomized collection of representative disease slides, including rigorous quality control and screening, correction of digital discrepancies, reasonable annotation, and the amount of data. The digital pathology is fundamental of clinical-grade PAI, and the techniques of data standardization and weakly supervised learning methods based on whole slide image (WSI) are effective ways to overcome obstacles of performance reproduction. [Conclusion] The representative data, the amount of labeling and consistency from multi-centers is the key to performance reproduction. The digital pathology for clinical diagnosis, data standardization and technique of WSI-based weakly supervised learning hopefully build clinical-grade PAI. Keywords: pathological artificial intelligence; data preparation; clinical-grade; deep learning

CLDec 29, 2025
Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process

Zhijun Chen, Zeyu Ji, Qianren Mao et al.

We propose LLM-PeerReview, an unsupervised LLM Ensemble method that selects the most ideal response from multiple LLM-generated candidates for each query, harnessing the collective wisdom of multiple models with diverse strengths. LLM-PeerReview is built on a novel, peer-review-inspired framework that offers a transparent and interpretable mechanism, while remaining fully unsupervised for flexible adaptability and generalization. Specifically, it operates in three stages: For scoring, we use the emerging LLM-as-a-Judge technique to evaluate each response by reusing multiple LLMs at hand; For reasoning, we can apply a straightforward averaging strategy or a principled graphical model-based truth inference algorithm to aggregate multiple scores to produce a final score for each response; Finally, the highest-scoring response is selected as the best ensemble output. LLM-PeerReview is conceptually simple and empirically powerful. Our results across four datasets show that the two variants of the proposed approach outperform the advanced model Smoothie-Global by 6.9% and 7.3% points, cross diverse task types including factual recall QA, math reasoning, and instruction following.

CLSep 29, 2025Code
Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?

Kai Sun, Yin Huang, Srishti Mehra et al.

The advent of Large Language Models (LLMs) has significantly advanced web-based Question Answering (QA) systems over semi-structured content, raising questions about the continued utility of knowledge extraction for question answering. This paper investigates the value of triple extraction in this new paradigm by extending an existing benchmark with knowledge extraction annotations and evaluating commercial and open-source LLMs of varying sizes. Our results show that web-scale knowledge extraction remains a challenging task for LLMs. Despite achieving high QA accuracy, LLMs can still benefit from knowledge extraction, through augmentation with extracted triples and multi-task learning. These findings provide insights into the evolving role of knowledge triple extraction in web-based QA and highlight strategies for maximizing LLM effectiveness across different model sizes and resource settings.

LGJul 7, 2024
Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

Xiaokang Pan, Xingyu Li, Jin Liu et al.

STOchastic Recursive Momentum (STORM)-based algorithms have been widely developed to solve one to $K$-level ($K \geq 3$) stochastic optimization problems. Specifically, they use estimators to mitigate the biased gradient issue and achieve near-optimal convergence results. However, there is relatively little work on understanding their generalization performance, particularly evident during the transition from one to $K$-level optimization contexts. This paper provides a comprehensive generalization analysis of three representative STORM-based algorithms: STORM, COVER, and SVMR, for one, two, and $K$-level stochastic optimizations under both convex and strongly convex settings based on algorithmic stability. Firstly, we define stability for $K$-level optimizations and link it to generalization. Then, we detail the stability results for three prominent STORM-based algorithms. Finally, we derive their excess risk bounds by balancing stability results with optimization errors. Our theoretical results provide strong evidence to complete STORM-based algorithms: (1) Each estimator may decrease their stability due to variance with its estimation target. (2) Every additional level might escalate the generalization error, influenced by the stability and the variance between its cumulative stochastic gradient and the true gradient. (3) Increasing the batch size for the initial computation of estimators presents a favorable trade-off, enhancing the generalization performance.

CVMar 5Code
Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

Rui Zhao, Bin Shi, Kai Sun et al.

Partial label learning is a prominent weakly supervised classification task, where each training instance is ambiguously labeled with a set of candidate labels. In real-world scenarios, candidate labels are often influenced by instance features, leading to the emergence of instance-dependent PLL (ID-PLL), a setting that more accurately reflects this relationship. A significant challenge in ID-PLL is instance entanglement, where instances from similar classes share overlapping features and candidate labels, resulting in increased class confusion. To address this issue, we propose a novel Class-specific Augmentation based Disentanglement (CAD) framework, which tackles instance entanglement by both intra- and inter-class regulations. For intra-class regulation, CAD amplifies class-specific features to generate class-wise augmentations and aligns same-class augmentations across instances. For inter-class regulation, CAD introduces a weighted penalty loss function that applies stronger penalties to more ambiguous labels, encouraging larger inter-class distances. By jointly applying intra- and inter-class regulations, CAD improves the clarity of class boundaries and reduces class confusion caused by entanglement. Extensive experimental results demonstrate the effectiveness of CAD in mitigating the entanglement problem and enhancing ID-PLL performance. The code is available at https://github.com/RyanZhaoIc/CAD.git.

NEMay 15, 2025Code
ILIF: Temporal Inhibitory Leaky Integrate-and-Fire Neuron for Overactivation in Spiking Neural Networks

Kai Sun, Peibo Duan, Levin Kuhlmann et al.

The Spiking Neural Network (SNN) has drawn increasing attention for its energy-efficient, event-driven processing and biological plausibility. To train SNNs via backpropagation, surrogate gradients are used to approximate the non-differentiable spike function, but they only maintain nonzero derivatives within a narrow range of membrane potentials near the firing threshold, referred to as the surrogate gradient support width gamma. We identify a major challenge, termed the dilemma of gamma: a relatively large gamma leads to overactivation, characterized by excessive neuron firing, which in turn increases energy consumption, whereas a small gamma causes vanishing gradients and weakens temporal dependencies. To address this, we propose a temporal Inhibitory Leaky Integrate-and-Fire (ILIF) neuron model, inspired by biological inhibitory mechanisms. This model incorporates interconnected inhibitory units for membrane potential and current, effectively mitigating overactivation while preserving gradient propagation. Theoretical analysis demonstrates ILIF effectiveness in overcoming the gamma dilemma, and extensive experiments on multiple datasets show that ILIF improves energy efficiency by reducing firing rates, stabilizes training, and enhances accuracy. The code is available at github.com/kaisun1/ILIF.

CLJun 7, 2024Code
CRAG -- Comprehensive RAG Benchmark

Xiao Yang, Kai Sun, Hao Xin et al.

Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation of this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve <=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% of questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge and attracted thousands of participants and submissions. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. CRAG is available at https://github.com/facebookresearch/CRAG/.

CVMar 10, 2021Code
Deep Convolutional Sparse Coding Network for Pansharpening with Guidance of Side Information

Shuang Xu, Jiangshe Zhang, Kai Sun et al.

Pansharpening is a fundamental issue in remote sensing field. This paper proposes a side information partially guided convolutional sparse coding (SCSC) model for pansharpening. The key idea is to split the low resolution multispectral image into a panchromatic image related feature map and a panchromatic image irrelated feature map, where the former one is regularized by the side information from panchromatic images. With the principle of algorithm unrolling techniques, the proposed model is generalized as a deep neural network, called as SCSC pansharpening neural network (SCSC-PNN). Compared with 13 classic and state-of-the-art methods on three satellites, the numerical experiments show that SCSC-PNN is superior to others. The codes are available at https://github.com/xsxjtu/SCSC-PNN.

CVMar 8, 2021Code
Deep Gradient Projection Networks for Pan-sharpening

Shuang Xu, Jiangshe Zhang, Zixiang Zhao et al.

Pan-sharpening is an important technique for remote sensing imaging systems to obtain high resolution multispectral images. Recently, deep learning has become the most popular tool for pan-sharpening. This paper develops a model-based deep pan-sharpening approach. Specifically, two optimization problems regularized by the deep prior are formulated, and they are separately responsible for the generative models for panchromatic images and low resolution multispectral images. Then, the two problems are solved by a gradient projection algorithm, and the iterative steps are generalized into two network blocks. By alternatively stacking the two blocks, a novel network, called gradient projection based pan-sharpening neural network, is constructed. The experimental results on different kinds of satellite datasets demonstrate that the new network outperforms state-of-the-art methods both visually and quantitatively. The codes are available at https://github.com/xsxjtu/GPPNN.

CVDec 29, 2020Code
Towards Reducing Severe Defocus Spread Effects for Multi-Focus Image Fusion via an Optimization Based Strategy

Shuang Xu, Lizhen Ji, Zhe Wang et al.

Multi-focus image fusion (MFF) is a popular technique to generate an all-in-focus image, where all objects in the scene are sharp. However, existing methods pay little attention to defocus spread effects of the real-world multi-focus images. Consequently, most of the methods perform badly in the areas near focus map boundaries. According to the idea that each local region in the fused image should be similar to the sharpest one among source images, this paper presents an optimization-based approach to reduce defocus spread effects. Firstly, a new MFF assessmentmetric is presented by combining the principle of structure similarity and detected focus maps. Then, MFF problem is cast into maximizing this metric. The optimization is solved by gradient ascent. Experiments conducted on the real-world dataset verify superiority of the proposed model. The codes are available at https://github.com/xsxjtu/MFF-SSIM.

CLOct 31, 2018Code
Improving Machine Reading Comprehension with General Reading Strategies

Kai Sun, Dian Yu, Dong Yu et al.

Reading strategies have been shown to improve comprehension levels, especially for readers lacking adequate prior knowledge. Just as the process of knowledge accumulation is time-consuming for human readers, it is resource-demanding to impart rich general domain knowledge into a deep language model via pre-training. Inspired by reading strategies identified in cognitive science, and given limited computational resources -- just a pre-trained model and a fixed number of training instances -- we propose three general strategies aimed to improve non-extractive machine reading comprehension (MRC): (i) BACK AND FORTH READING that considers both the original and reverse order of an input sequence, (ii) HIGHLIGHTING, which adds a trainable embedding to the text embedding of tokens that are relevant to the question and candidate answers, and (iii) SELF-ASSESSMENT that generates practice questions and candidate answers directly from the text in an unsupervised manner. By fine-tuning a pre-trained language model (Radford et al., 2018) with our proposed strategies on the largest general domain multiple-choice MRC dataset RACE, we obtain a 5.8% absolute increase in accuracy over the previous best result achieved by the same pre-trained model fine-tuned on RACE without the use of strategies. We further fine-tune the resulting model on a target MRC task, leading to an absolute improvement of 6.2% in average accuracy over previous state-of-the-art approaches on six representative non-extractive MRC datasets from different domains (i.e., ARC, OpenBookQA, MCTest, SemEval-2018 Task 11, ROCStories, and MultiRC). These results demonstrate the effectiveness of our proposed strategies and the versatility and general applicability of our fine-tuned models that incorporate these strategies. Core code is available at https://github.com/nlpdata/strategy/.

NEDec 12, 2025
CogniSNN: Enabling Neuron-Expandability, Pathway-Reusability, and Dynamic-Configurability with Random Graph Architectures in Spiking Neural Networks

Yongsheng Huang, Peibo Duan, Yujie Wu et al.

Spiking neural networks (SNNs), regarded as the third generation of artificial neural networks, are expected to bridge the gap between artificial intelligence and computational neuroscience. However, most mainstream SNN research directly adopts the rigid, chain-like hierarchical architecture of traditional artificial neural networks (ANNs), ignoring key structural characteristics of the brain. Biological neurons are stochastically interconnected, forming complex neural pathways that exhibit Neuron-Expandability, Pathway-Reusability, and Dynamic-Configurability. In this paper, we introduce a new SNN paradigm, named Cognition-aware SNN (CogniSNN), by incorporating Random Graph Architecture (RGA). Furthermore, we address the issues of network degradation and dimensional mismatch in deep pathways by introducing an improved pure spiking residual mechanism alongside an adaptive pooling strategy. Then, we design a Key Pathway-based Learning without Forgetting (KP-LwF) approach, which selectively reuses critical neural pathways while retaining historical knowledge, enabling efficient multi-task transfer. Finally, we propose a Dynamic Growth Learning (DGL) algorithm that allows neurons and synapses to grow dynamically along the internal temporal dimension. Extensive experiments demonstrate that CogniSNN achieves performance comparable to, or even surpassing, current state-of-the-art SNNs on neuromorphic datasets and Tiny-ImageNet. The Pathway-Reusability enhances the network's continuous learning capability across different scenarios, while the dynamic growth algorithm improves robustness against interference and mitigates the fixed-timestep constraints during neuromorphic chip deployment. This work demonstrates the potential of SNNs with random graph structures in advancing brain-inspired intelligence and lays the foundation for their practical application on neuromorphic hardware.

AIJan 9
Cumulative Path-Level Semantic Reasoning for Inductive Knowledge Graph Completion

Jiapu Wang, Xinghe Cheng, Zezheng Wu et al.

Conventional Knowledge Graph Completion (KGC) methods aim to infer missing information in incomplete Knowledge Graphs (KGs) by leveraging existing information, which struggle to perform effectively in scenarios involving emerging entities. Inductive KGC methods can handle the emerging entities and relations in KGs, offering greater dynamic adaptability. While existing inductive KGC methods have achieved some success, they also face challenges, such as susceptibility to noisy structural information during reasoning and difficulty in capturing long-range dependencies in reasoning paths. To address these challenges, this paper proposes the Cumulative Path-Level Semantic Reasoning for inductive knowledge graph completion (CPSR) framework, which simultaneously captures both the structural and semantic information of KGs to enhance the inductive KGC task. Specifically, the proposed CPSR employs a query-dependent masking module to adaptively mask noisy structural information while retaining important information closely related to the targets. Additionally, CPSR introduces a global semantic scoring module that evaluates both the individual contributions and the collective impact of nodes along the reasoning path within KGs. The experimental results demonstrate that CPSR achieves state-of-the-art performance.