Kaiwen Li

CV
h-index14
14papers
493citations
Novelty51%
AI Score55

14 Papers

LGApr 25, 2022
Deep Reinforcement Learning for Online Routing of Unmanned Aerial Vehicles with Wireless Power Transfer

Kaiwen Li, Tao Zhang, Rui Wang et al.

The unmanned aerial vehicle (UAV) plays an vital role in various applications such as delivery, military mission, disaster rescue, communication, etc., due to its flexibility and versatility. This paper proposes a deep reinforcement learning method to solve the UAV online routing problem with wireless power transfer, which can charge the UAV remotely without wires, thus extending the capability of the battery-limited UAV. Our study considers the power consumption of the UAV and the wireless charging process. Unlike the previous works, we solve the problem by a designed deep neural network. The model is trained using a deep reinforcement learning method offline, and is used to optimize the UAV routing problem online. On small and large scale instances, the proposed model runs from four times to 500 times faster than Google OR-tools, the state-of-the-art combinatorial optimization solver, with identical solution quality. It also outperforms different types of heuristic and local search methods in terms of both run-time and optimality. In addition, once the model is trained, it can scale to new generated problem instances with arbitrary topology that are not seen during training. The proposed method is practically applicable when the problem scale is large and the response time is crucial.

LGJul 4, 2023
Learning to Branch in Combinatorial Optimization with Graph Pointer Networks

Rui Wang, Zhiming Zhou, Tao Zhang et al.

Branch-and-bound is a typical way to solve combinatorial optimization problems. This paper proposes a graph pointer network model for learning the variable selection policy in the branch-and-bound. We extract the graph features, global features and historical features to represent the solver state. The proposed model, which combines the graph neural network and the pointer mechanism, can effectively map from the solver state to the branching variable decisions. The model is trained to imitate the classic strong branching expert rule by a designed top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. Our approach also outperforms the state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.

CVFeb 25
Geometry-as-context: Modulating Explicit 3D in Scene-consistent Video Generation to Geometry Context

JiaKui Hu, Jialun Liu, Liying Yang et al.

Scene-consistent video generation aims to create videos that explore 3D scenes based on a camera trajectory. Previous methods rely on video generation models with external memory for consistency, or iterative 3D reconstruction and inpainting, which accumulate errors during inference due to incorrect intermediary outputs, non-differentiable processes, and separate models. To overcome these limitations, we introduce ``geometry-as-context". It iteratively completes the following steps using an autoregressive camera-controlled video generation model: (1) estimates the geometry of the current view necessary for 3D reconstruction, and (2) simulates and restores novel view images rendered by the 3D scene. Under this multi-task framework, we develop the camera gated attention module to enhance the model's capability to effectively leverage camera poses. During the training phase, text contexts are utilized to ascertain whether geometric or RGB images should be generated. To ensure that the model can generate RGB-only outputs during inference, the geometry context is randomly dropped from the interleaved text-image-geometry training sequence. The method has been tested on scene video generation with one-direction and forth-and-back trajectories. The results show its superiority over previous approaches in maintaining scene consistency and camera control.

CVJan 27
Bridging Information Asymmetry: A Hierarchical Framework for Deterministic Blind Face Restoration

Zhengjian Yao, Jiakui Hu, Kaiwen Li et al.

Blind face restoration remains a persistent challenge due to the inherent ill-posedness of reconstructing holistic structures from severely constrained observations. Current generative approaches, while capable of synthesizing realistic textures, often suffer from information asymmetry -- the intrinsic disparity between the information-sparse low quality inputs and the information-dense high quality outputs. This imbalance leads to a one-to-many mapping, where insufficient constraints result in stochastic uncertainty and hallucinatory artifacts. To bridge this gap, we present \textbf{Pref-Restore}, a hierarchical framework that integrates discrete semantic logic with continuous texture generation to achieve deterministic, preference-aligned restoration. Our methodology fundamentally addresses this information disparity through two complementary strategies: (1) Augmenting Input Density: We employ an auto-regressive integrator to reformulate textual instructions into dense latent queries, injecting high-level semantic stability to constrain the degraded signals; (2) Pruning Output Distribution: We pioneer the integration of on-policy reinforcement learning directly into the diffusion restoration loop. By transforming human preferences into differentiable constraints, we explicitly penalize stochastic deviations, thereby sharpening the posterior distribution toward the desired high-fidelity outcomes. Extensive experiments demonstrate that Pref-Restore achieves state-of-the-art performance across synthetic and real-world benchmarks. Furthermore, empirical analysis confirms that our preference-aligned strategy significantly reduces solution entropy, establishing a robust pathway toward reliable and deterministic blind restoration.

LGJan 5
RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data

Peiyan Hu, Haodong Feng, Hongyuan Liu et al.

Predicting the evolution of complex physical systems remains a central problem in science and engineering. Despite rapid progress in scientific Machine Learning (ML) models, a critical bottleneck is the lack of expensive real-world data, resulting in most current models being trained and validated on simulated data. Beyond limiting the development and evaluation of scientific ML, this gap also hinders research into essential tasks such as sim-to-real transfer. We introduce RealPDEBench, the first benchmark for scientific ML that integrates real-world measurements with paired numerical simulations. RealPDEBench consists of five datasets, three tasks, eight metrics, and ten baselines. We first present five real-world measured datasets with paired simulated datasets across different complex physical systems. We further define three tasks, which allow comparisons between real-world and simulated data, and facilitate the development of methods to bridge the two. Moreover, we design eight evaluation metrics, spanning data-oriented and physics-oriented metrics, and finally benchmark ten representative baselines, including state-of-the-art models, pretrained PDE foundation models, and a traditional method. Experiments reveal significant discrepancies between simulated and real-world data, while showing that pretraining with simulated data consistently improves both accuracy and convergence. In this work, we hope to provide insights from real-world data, advancing scientific ML toward bridging the sim-to-real gap and real-world deployment. Our benchmark, datasets, and instructions are available at https://realpdebench.github.io/.

LGMar 26, 2025Code
PlatMetaX: An Integrated MATLAB platform for Meta-Black-Box Optimization

Xu Yang, Rui Wang, Kaiwen Li et al.

The landscape of optimization problems has become increasingly complex, necessitating the development of advanced optimization techniques. Meta-Black-Box Optimization (MetaBBO), which involves refining the optimization algorithms themselves via meta-learning, has emerged as a promising approach. Recognizing the limitations in existing platforms, we presents PlatMetaX, a novel MATLAB platform for MetaBBO with reinforcement learning. PlatMetaX integrates the strengths of MetaBox and PlatEMO, offering a comprehensive framework for developing, evaluating, and comparing optimization algorithms. The platform is designed to handle a wide range of optimization problems, from single-objective to multi-objective, and is equipped with a rich set of baseline algorithms and evaluation metrics. We demonstrate the utility of PlatMetaX through extensive experiments and provide insights into its design and implementation. PlatMetaX is available at: \href{https://github.com/Yxxx616/PlatMetaX}{https://github.com/Yxxx616/PlatMetaX}.

CVJun 22, 2025Code
Training-free Test-time Improvement for Explainable Medical Image Classification

Hangzhou He, Jiachen Tang, Lei Zhu et al. · pku

Deep learning-based medical image classification techniques are rapidly advancing in medical image analysis, making it crucial to develop accurate and trustworthy models that can be efficiently deployed across diverse clinical scenarios. Concept Bottleneck Models (CBMs), which first predict a set of explainable concepts from images and then perform classification based on these concepts, are increasingly being adopted for explainable medical image classification. However, the inherent explainability of CBMs introduces new challenges when deploying trained models to new environments. Variations in imaging protocols and staining methods may induce concept-level shifts, such as alterations in color distribution and scale. Furthermore, since CBM training requires explicit concept annotations, fine-tuning models solely with image-level labels could compromise concept prediction accuracy and faithfulness - a critical limitation given the high cost of acquiring expert-annotated concept labels in medical domains. To address these challenges, we propose a training-free confusion concept identification strategy. By leveraging minimal new data (e.g., 4 images per class) with only image-level labels, our approach enhances out-of-domain performance without sacrificing source domain accuracy through two key operations: masking misactivated confounding concepts and amplifying under-activated discriminative concepts. The efficacy of our method is validated on both skin and white blood cell images. Our code is available at: https://github.com/riverback/TF-TTI-XMed.

CVAug 4, 2025Code
I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity Linking

Ziyan Liu, Junwen Li, Kaiwen Li et al.

Multimodal entity linking plays a crucial role in a wide range of applications. Recent advances in large language model-based methods have become the dominant paradigm for this task, effectively leveraging both textual and visual modalities to enhance performance. Despite their success, these methods still face two challenges, including unnecessary incorporation of image data in certain scenarios and the reliance only on a one-time extraction of visual features, which can undermine their effectiveness and accuracy. To address these challenges, we propose a novel LLM-based framework for the multimodal entity linking task, called Intra- and Inter-modal Collaborative Reflections. This framework prioritizes leveraging text information to address the task. When text alone is insufficient to link the correct entity through intra- and inter-modality evaluations, it employs a multi-round iterative strategy that integrates key visual clues from various aspects of the image to support reasoning and enhance matching accuracy. Extensive experiments on three widely used public datasets demonstrate that our framework consistently outperforms current state-of-the-art methods in the task, achieving improvements of 3.2%, 5.1%, and 1.6%, respectively. Our code is available at https://github.com/ziyan-xiaoyu/I2CR/.

IVJul 31, 2025Code
Improve Retinal Artery/Vein Classification via Channel Couplin

Shuang Zeng, Chee Hong Lee, Kaiwen Li et al. · pku

Retinal vessel segmentation plays a vital role in analyzing fundus images for the diagnosis of systemic and ocular diseases. Building on this, classifying segmented vessels into arteries and veins (A/V) further enables the extraction of clinically relevant features such as vessel width, diameter and tortuosity, which are essential for detecting conditions like diabetic and hypertensive retinopathy. However, manual segmentation and classification are time-consuming, costly and inconsistent. With the advancement of Convolutional Neural Networks, several automated methods have been proposed to address this challenge, but there are still some issues. For example, the existing methods all treat artery, vein and overall vessel segmentation as three separate binary tasks, neglecting the intrinsic coupling relationships between these anatomical structures. Considering artery and vein structures are subsets of the overall retinal vessel map and should naturally exhibit prediction consistency with it, we design a novel loss named Channel-Coupled Vessel Consistency Loss to enforce the coherence and consistency between vessel, artery and vein predictions, avoiding biasing the network toward three simple binary segmentation tasks. Moreover, we also introduce a regularization term named intra-image pixel-level contrastive loss to extract more discriminative feature-level fine-grained representations for accurate retinal A/V classification. SOTA results have been achieved across three public A/V classification datasets including RITE, LES-AV and HRF. Our code will be available upon acceptance.

CVSep 22, 2025
Chat-CBM: Towards Interactive Concept Bottleneck Models with Frozen Large Language Models

Hangzhou He, Lei Zhu, Kaiwen Li et al. · pku

Concept Bottleneck Models (CBMs) provide inherent interpretability by first predicting a set of human-understandable concepts and then mapping them to labels through a simple classifier. While users can intervene in the concept space to improve predictions, traditional CBMs typically employ a fixed linear classifier over concept scores, which restricts interventions to manual value adjustments and prevents the incorporation of new concepts or domain knowledge at test time. These limitations are particularly severe in unsupervised CBMs, where concept activations are often noisy and densely activated, making user interventions ineffective. We introduce Chat-CBM, which replaces score-based classifiers with a language-based classifier that reasons directly over concept semantics. By grounding prediction in the semantic space of concepts, Chat-CBM preserves the interpretability of CBMs while enabling richer and more intuitive interventions, such as concept correction, addition or removal of concepts, incorporation of external knowledge, and high-level reasoning guidance. Leveraging the language understanding and few-shot capabilities of frozen large language models, Chat-CBM extends the intervention interface of CBMs beyond numerical editing and remains effective even in unsupervised settings. Experiments on nine datasets demonstrate that Chat-CBM achieves higher predictive performance and substantially improves user interactivity while maintaining the concept-based interpretability of CBMs.

LGMar 2, 2025
Graph Attention Networks Unleashed: A Fast and Explainable Vulnerability Assessment Framework for Microgrids

Wei Liu, Tao Zhang, Chenhui Lin et al.

Independent microgrids are crucial for supplying electricity by combining distributed energy resources and loads in scenarios like isolated islands and field combat. Fast and accurate assessments of microgrid vulnerability against intentional attacks or natural disasters are essential for effective risk prevention and design optimization. However, conventional Monte Carlo simulation (MCS) methods are computationally expensive and time-consuming, while existing machine learning-based approaches often lack accuracy and explainability. To address these challenges, this study proposes a fast and explainable vulnerability assessment framework that integrates MCS with a graph attention network enhanced by self-attention pooling (GAT-S). MCS generates training data, while the GAT-S model learns the structural and electrical characteristics of the microgrid and further assesses its vulnerability intelligently. The GAT-S improves explainability and computational efficiency by dynamically assigning attention weights to critical nodes. Comprehensive experimental evaluations across various microgrid configurations demonstrate that the proposed framework provides accurate vulnerability assessments, achieving a mean squared error as low as 0.001, real-time responsiveness within 1 second, and delivering explainable results.

NEJan 22, 2025
Reinforcement learning Based Automated Design of Differential Evolution Algorithm for Black-box Optimization

Xu Yang, Rui Wang, Kaiwen Li et al.

Differential evolution (DE) algorithm is recognized as one of the most effective evolutionary algorithms, demonstrating remarkable efficacy in black-box optimization due to its derivative-free nature. Numerous enhancements to the fundamental DE have been proposed, incorporating innovative mutation strategies and sophisticated parameter tuning techniques to improve performance. However, no single variant has proven universally superior across all problems. To address this challenge, we introduce a novel framework that employs reinforcement learning (RL) to automatically design DE for black-box optimization through meta-learning. RL acts as an advanced meta-optimizer, generating a customized DE configuration that includes an optimal initialization strategy, update rule, and hyperparameters tailored to a specific black-box optimization problem. This process is informed by a detailed analysis of the problem characteristics. In this proof-of-concept study, we utilize a double deep Q-network for implementation, considering a subset of 40 possible strategy combinations and parameter optimizations simultaneously. The framework's performance is evaluated against black-box optimization benchmarks and compared with state-of-the-art algorithms. The experimental results highlight the promising potential of our proposed framework.

NEFeb 11, 2021
Deep Reinforcement Learning for Combinatorial Optimization: Covering Salesman Problems

Kaiwen Li, Tao Zhang, Rui Wang Yuheng Wang et al.

This paper introduces a new deep learning approach to approximately solve the Covering Salesman Problem (CSP). In this approach, given the city locations of a CSP as input, a deep neural network model is designed to directly output the solution. It is trained using the deep reinforcement learning without supervision. Specifically, in the model, we apply the Multi-head Attention to capture the structural patterns, and design a dynamic embedding to handle the dynamic patterns of the problem. Once the model is trained, it can generalize to various types of CSP tasks (different sizes and topologies) with no need of re-training. Through controlled experiments, the proposed approach shows desirable time complexity: it runs more than 20 times faster than the traditional heuristic solvers with a tiny gap of optimality. Moreover, it significantly outperforms the current state-of-the-art deep learning approaches for combinatorial optimization in the aspect of both training and inference. In comparison with traditional solvers, this approach is highly desirable for most of the challenging tasks in practice that are usually large-scale and require quick decisions.

NEJun 6, 2019
Deep Reinforcement Learning for Multi-objective Optimization

Kaiwen Li, Tao Zhang, Rui Wang

This study proposes an end-to-end framework for solving multi-objective optimization problems (MOPs) using Deep Reinforcement Learning (DRL), that we call DRL-MOA. The idea of decomposition is adopted to decompose the MOP into a set of scalar optimization subproblems. Then each subproblem is modelled as a neural network. Model parameters of all the subproblems are optimized collaboratively according to a neighborhood-based parameter-transfer strategy and the DRL training algorithm. Pareto optimal solutions can be directly obtained through the trained neural network models. In specific, the multi-objective travelling salesman problem (MOTSP) is solved in this work using the DRL-MOA method by modelling the subproblem as a Pointer Network. Extensive experiments have been conducted to study the DRL-MOA and various benchmark methods are compared with it. It is found that, once the trained model is available, it can scale to newly encountered problems with no need of re-training the model. The solutions can be directly obtained by a simple forward calculation of the neural network; thereby, no iteration is required and the MOP can be always solved in a reasonable time. The proposed method provides a new way of solving the MOP by means of DRL. It has shown a set of new characteristics, e.g., strong generalization ability and fast solving speed in comparison with the existing methods for multi-objective optimizations. Experimental results show the effectiveness and competitiveness of the proposed method in terms of model performance and running time.