Lin Cheng

CV
h-index10
28papers
145citations
Novelty49%
AI Score53

28 Papers

CVAug 8, 2023
All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

Weixuan Sun, Yanhao Zhang, Zhen Qin et al.

In this work, we propose a new transformer-based regularization to better localize objects for Weakly supervised semantic segmentation (WSSS). In image-level WSSS, Class Activation Map (CAM) is adopted to generate object localization as pseudo segmentation labels. To address the partial activation issue of the CAMs, consistency regularization is employed to maintain activation intensity invariance across various image augmentations. However, such methods ignore pair-wise relations among regions within each CAM, which capture context and should also be invariant across image views. To this end, we propose a new all-pairs consistency regularization (ACR). Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent. We adopt vision transformers as the self-attention mechanism naturally embeds pair-wise affinity. This enables us to simply regularize the distance between the attention matrices of augmented image pairs. Additionally, we introduce a novel class-wise localization method that leverages the gradients of the class token. Our method can be seamlessly integrated into existing WSSS methods using transformers without modifying the architectures. We evaluate our method on PASCAL VOC and MS COCO datasets. Our method produces noticeably better class localization maps (67.3% mIoU on PASCAL VOC train), resulting in superior WSSS performances.

SYMar 16, 2023
Methodology for Capacity Credit Evaluation of Physical and Virtual Energy Storage in Decarbonized Power System

Ning Qi, Peng Li, Lin Cheng et al.

Energy storage (ES) and virtual energy storage (VES) are key components to realizing power system decarbonization. Although ES and VES have been proven to deliver various types of grid services, little work has so far provided a systematical framework for quantifying their adequacy contribution and credible capacity value while incorporating human and market behavior. Therefore, this manuscript proposed a novel evaluation framework to evaluate the capacity credit (CC) of ES and VES. To address the system capacity inadequacy and market behavior of storage, a two-stage coordinated dispatch is proposed to achieve the trade-off between day-ahead self-energy management of resources and efficient adjustment to real-time failures. And we further modeled the human behavior with storage operations and incorporate two types of decision-independent uncertainties (DIUs) (operate state and self-consumption) and one type of decision-dependent uncertainty (DDUs) (available capacity) into the proposed dispatch. Furthermore, novel reliability and CC indices (e.g., equivalent physical storage capacity (EPSC)) are introduced to evaluate the practical and theoretical adequacy contribution of ES and VES, as well as the ability to displace generation and physical storage while maintaining equivalent system adequacy. Exhaustive case studies based on the IEEE RTS-79 system and real-world data verify the significant consequence (10%-70% overestimated CC) of overlooking DIUs and DDUs in the previous works, while the proposed method outperforms other and can generate a credible and realistic result. Finally, we investigate key factors affecting the adequacy contribution of ES and VES, and reasonable suggestions are provided for better flexibility utilization of ES and VES in decarbonized power system.

89.8SYMay 4
LCL Resonance Analysis and Damping in Single-Loop Grid-Forming Wind Turbines

Meng Chen, Yufei Xi, Frede Blaabjerg et al.

A common assumption in both grid-following (GFL) and grid-forming (GFM) control systems is that they are open-loop (OL) stable in the vicinity of high-frequency resonances. Hence classical loop-shaping approaches are often used for establishing stability margins and designing active damping (AD) strategies. This paper shows that single-loop GFM (SL-GFM) control schemes incorporating a widely used class of reactive power (RAP) control, referred to as droop-I control, can lead to OL unstable poles. This finding reveals a novel instability mechanism resulting in a reduced stability margin and robustness at high frequencies. The sensitivity of this phenomenon to both RAP and electrical parameters is analyzed in detail. An AD design that explicitly accounts for the newly identified instability mechanism is proposed. We also provide a comparison between such SL-GFM and well-studied GFL control schemes, highlighting quite different resonance features between them. Validation is performed through experiments.

SYMar 4, 2019
Fully Distributed DC Optimal Power Flow Based on Distributed Economic Dispatch and Distributed State Estimation

Qiao Li, David Wenzhong Gao, Lin Cheng et al.

Optimal power flow (OPF) is an important technique for power systems to achieve optimal operation while satisfying multiple constraints. The traditional OPF are mostly centralized methods which are executed in the centralized control center. This paper introduces a totally Distributed DC Optimal Power Flow (DDCOPF) method for future power systems which have more and more distributed generators. The proposed method is based on the Distributed Economic Dispatch (DED) method and the Distributed State Estimation (DSE) method. In this proposed scheme, the DED method is used to achieve the optimal power dispatch with the lowest cost, and the DSE method provides power flow information of the power system to the proposed DDCOPF algorithm. In the proposed method, the Auto-Regressive (AR) model is used to predict the load variation so that the proposed algorithm can prevent overflow. In addition, a method called constraint algorithm is developed to correct the results of DED with the proposed correction algorithm and penalty term so that the constraints for the power system will not be violated. Different from existing research, the proposed method is completely distributed without need for any centralized facility.

QMMar 11, 2023
Intelligent diagnostic scheme for lung cancer screening with Raman spectra data by tensor network machine learning

Yu-Jia An, Sheng-Chen Bai, Lin Cheng et al.

Artificial intelligence (AI) has brought tremendous impacts on biomedical sciences from academic researches to clinical applications, such as in biomarkers' detection and diagnosis, optimization of treatment, and identification of new therapeutic targets in drug discovery. However, the contemporary AI technologies, particularly deep machine learning (ML), severely suffer from non-interpretability, which might uncontrollably lead to incorrect predictions. Interpretability is particularly crucial to ML for clinical diagnosis as the consumers must gain necessary sense of security and trust from firm grounds or convincing interpretations. In this work, we propose a tensor-network (TN)-ML method to reliably predict lung cancer patients and their stages via screening Raman spectra data of Volatile organic compounds (VOCs) in exhaled breath, which are generally suitable as biomarkers and are considered to be an ideal way for non-invasive lung cancer screening. The prediction of TN-ML is based on the mutual distances of the breath samples mapped to the quantum Hilbert space. Thanks to the quantum probabilistic interpretation, the certainty of the predictions can be quantitatively characterized. The accuracy of the samples with high certainty is almost 100$\%$. The incorrectly-classified samples exhibit obviously lower certainty, and thus can be decipherably identified as anomalies, which will be handled by human experts to guarantee high reliability. Our work sheds light on shifting the ``AI for biomedical sciences'' from the conventional non-interpretable ML schemes to the interpretable human-ML interactive approaches, for the purpose of high accuracy and reliability.

CVNov 15, 2023
Simple but Effective Unsupervised Classification for Specified Domain Images: A Case Study on Fungi Images

Zhaocong liu, Fa Zhang, Lin Cheng et al.

High-quality labeled datasets are essential for deep learning. Traditional manual annotation methods are not only costly and inefficient but also pose challenges in specialized domains where expert knowledge is needed. Self-supervised methods, despite leveraging unlabeled data for feature extraction, still require hundreds or thousands of labeled instances to guide the model for effective specialized image classification. Current unsupervised learning methods offer automatic classification without prior annotation but often compromise on accuracy. As a result, efficiently procuring high-quality labeled datasets remains a pressing challenge for specialized domain images devoid of annotated data. Addressing this, an unsupervised classification method with three key ideas is introduced: 1) dual-step feature dimensionality reduction using a pre-trained model and manifold learning, 2) a voting mechanism from multiple clustering algorithms, and 3) post-hoc instead of prior manual annotation. This approach outperforms supervised methods in classification accuracy, as demonstrated with fungal image data, achieving 94.1% and 96.7% on public and private datasets respectively. The proposed unsupervised classification method reduces dependency on pre-annotated datasets, enabling a closed-loop for data classification. The simplicity and ease of use of this method will also bring convenience to researchers in various fields in building datasets, promoting AI applications for images in specialized domains.

64.7SYMar 23
Full Timescale Hierarchical MPC-MTIP Framework for Hybrid Energy Storage Management in Low-Carbon Industrial Microgrid

Daniyaer Paizulamu, Lin Cheng, Ning Qi et al.

Uncertainties in balancing generation and load in low-carbon industrial microgrids (IMGs) make hybrid energy storage systems (HESS) crucial for their stable and economic operation. Existing model predictive control (MPC) techniques typically enforce periodic state of charge (SOC) constraints to maintain long term stability. However, these hard constraints compromise dispatch flexibility near the end of the prediction horizon, preventing sufficient energy release during critical peaks and leading to optimization infeasibility. This paper eliminates the periodic SOC constraints of individual storage units and proposes a novel full-timescale hierarchical MPC scheduling framework. Specifically, comprehensive physical and cost models are established for the HESS composed of flywheel, battery, compressed-air, and hydrogen-methanol energy storage. The control problem is decoupled into a hierarchical MPC architecture. Furthermore, a novel adaptive feedback mechanism based on micro trajectory inverse projection (MTIP) is embedded into the scheduling process, accurately mapping the high frequency dynamic buffering capabilities of lower tier storages into the upper decision space to generate dynamic boundaries. Experiments using 14 consecutive months of second-level data from a real-world IMG validate the effectiveness of the proposed method, demonstrating its significant superiority over existing approaches. By effectively preventing limit violations and deadlocks in lower-tier storages under extreme fluctuations, it achieves a 97.4\% net load smoothing rate and a 62.2\% comprehensive cycle efficiency.

CVOct 11, 2021Code
TSGB: Target-Selective Gradient Backprop for Probing CNN Visual Saliency

Lin Cheng, Pengfei Fang, Yanjie Liang et al.

The explanation for deep neural networks has drawn extensive attention in the deep learning community over the past few years. In this work, we study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks. Compared to iteration based saliency methods, single backward pass based saliency methods benefit from faster speed, and they are widely used in downstream visual tasks. Thus, we focus on single backward pass based methods. However, existing methods in this category struggle to uccessfully produce fine-grained saliency maps concentrating on specific target classes. That said, producing faithful saliency maps satisfying both target-selectiveness and fine-grainedness using a single backward pass is a challenging problem in the field. To mitigate this problem, we revisit the gradient flow inside the network, and find that the entangled semantics and original weights may disturb the propagation of target-relevant saliency. Inspired by those observations, we propose a novel visual saliency method, termed Target-Selective Gradient Backprop (TSGB), which leverages rectification operations to effectively emphasize target classes and further efficiently propagate the saliency to the image space, thereby generating target-selective and fine-grained saliency maps. The proposed TSGB consists of two components, namely, TSGB-Conv and TSGB-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively. Extensive qualitative and quantitative experiments on the ImageNet and Pascal VOC datasets show that the proposed method achieves more accurate and reliable results than the other competitive methods. Code is available at https://github.com/123fxdx/CNNvisualizationTSGB.

91.5SYApr 1
Demand response potential evaluation of a zero carbon hydrogen metallurgy system considering shaft furnace's flexibility

Qiang Ji, Lin Cheng, Kaidi Huang et al.

The increasing penetration of intermittent renewable energy sources and the retirement of thermal units have widened the power system flexibility gap. Industrial demand response (DR) driven by real-time pricing is widely regarded as a viable solution. In this paper, we propose a framework to quantify the DR potential of a zero-carbon hydrogen metallurgy system (ZCHMS) considering shaft furnace's flexibility. First, we model the shaft furnace as a constrained flexible load and validate the model via simulation, achieving a root mean square error of 4.48\% of the rated load. Second, we formulate a DR potential evaluation method that determines baseline and DR-based production scheduling schemes by minimizing operating cost subject to production orders. Finally, the numerical results show that compared with the baseline, DR-based ZCHMS reduces operating cost by 6.6\%, incentivizing demand-side management in ironmaking and strengthening power-ironmaking synergies.

CVFeb 14, 2025
RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control

Teng Li, Guangcong Zheng, Rui Jiang et al.

Recent advancements in camera-trajectory-guided image-to-video generation offer higher precision and better support for complex camera control compared to text-based approaches. However, they also introduce significant usability challenges, as users often struggle to provide precise camera parameters when working with arbitrary real-world images without knowledge of their depth nor scene scale. To address these real-world application issues, we propose RealCam-I2V, a novel diffusion-based video generation framework that integrates monocular metric depth estimation to establish 3D scene reconstruction in a preprocessing step. During training, the reconstructed 3D scene enables scaling camera parameters from relative to metric scales, ensuring compatibility and scale consistency across diverse real-world images. In inference, RealCam-I2V offers an intuitive interface where users can precisely draw camera trajectories by dragging within the 3D scene. To further enhance precise camera control and scene consistency, we propose scene-constrained noise shaping, which shapes high-level noise and also allows the framework to maintain dynamic and coherent video generation in lower noise stages. RealCam-I2V achieves significant improvements in controllability and video quality on the RealEstate10K and out-of-domain images. We further enables applications like camera-controlled looping video generation and generative frame interpolation. Project page: https://zgctroy.github.io/RealCam-I2V.

LGDec 22, 2023
Graph Attention-Based Symmetry Constraint Extraction for Analog Circuits

Qi Xu, Lijie Wang, Jing Wang et al.

In recent years, analog circuits have received extensive attention and are widely used in many emerging applications. The high demand for analog circuits necessitates shorter circuit design cycles. To achieve the desired performance and specifications, various geometrical symmetry constraints must be carefully considered during the analog layout process. However, the manual labeling of these constraints by experienced analog engineers is a laborious and time-consuming process. To handle the costly runtime issue, we propose a graph-based learning framework to automatically extract symmetric constraints in analog circuit layout. The proposed framework leverages the connection characteristics of circuits and the devices' information to learn the general rules of symmetric constraints, which effectively facilitates the extraction of device-level constraints on circuit netlists. The experimental results demonstrate that compared to state-of-the-art symmetric constraint detection approaches, our framework achieves higher accuracy and F1-score.

97.9SYApr 6
A Process-Aware Demand Response Framework for Hydrogen-Integrated Zero-Carbon Steel Plants Coupled with Methanol Production

Qiang Ji, Lin Cheng, Yue Zhou et al.

The integration of the high penetration of intermittent renewable energy sources (RES) and the retirement of thermal units have significantly aggravated the flexibility scarcity and real-time balancing challenges in power systems. Low-carbon steel production systems, based on green-hydrogen ironmaking and electrified melting, possess substantial demand response (DR) potential. This paper proposes a process-aware DR evaluation framework for hydrogen-integrated zero-carbon steel plants coupled with methanol production (H2-DRI-EAF-MeOH). First, a novel zero-carbon steel production system architecture is established to explicitly represent the energy-material flow coupling relationships among electricity, hydrogen, heat, iron, steel, CO2, and methanol. Second, to explicitly capture electric arc furnace (EAF) operational constraints while preserving optimization tractability, an operating feasible region model is developed and validated using field data from a pure hydrogen direct reduced iron and EAF plant, yielding an average relative error of 4.1%. Finally, a process-aware DR scheduling model is formulated by incorporating the proposed process deviation penalties to balance economic performance against process disturbance costs and operational acceptability. Additionally, dual-side evaluation metrics are developed to quantify grid-side regulation performance and load-side flexibility characteristics. Case studies demonstrate that under real-time pricing, the proposed system achieves an average DR capacity of 275.4 MW, improves the RES-load matching degree from 0.262 to 0.508, and reduces total operational costs by 17.78% compared with the baseline scheduling scheme. The proposed framework provides a theoretical foundation for RES-steel-chemical synergies.

LGFeb 4, 2025
Error Distribution Smoothing:Advancing Low-Dimensional Imbalanced Regression

Donghe Chen, Jiaxuan Yue, Tengjie Zheng et al.

In real-world regression tasks, datasets frequently exhibit imbalanced distributions, characterized by a scarcity of data in high-complexity regions and an abundance in low-complexity areas. This imbalance presents significant challenges for existing classification methods with clear class boundaries, while highlighting a scarcity of approaches specifically designed for imbalanced regression problems. To better address these issues, we introduce a novel concept of Imbalanced Regression, which takes into account both the complexity of the problem and the density of data points, extending beyond traditional definitions that focus only on data density. Furthermore, we propose Error Distribution Smoothing (EDS) as a solution to tackle imbalanced regression, effectively selecting a representative subset from the dataset to reduce redundancy while maintaining balance and representativeness. Through several experiments, EDS has shown its effectiveness, and the related code and dataset can be accessed at https://anonymous.4open.science/r/Error-Distribution-Smoothing-762F.

LGNov 26, 2025
Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning

Shanwei Fan, Bin Zhang, Zhiwei Xu et al.

Large language models (LLMs) offer strong high-level planning capabilities for reinforcement learning (RL) by decomposing tasks into subgoals. However, their practical utility is limited by poor planning-execution alignment, which reflects a critical gap between abstract plans and actionable, environment-compatible behaviors. This misalignment arises from two interrelated limitations: (1) LLMs often produce subgoals that are semantically plausible but infeasible or irrelevant in the target environment due to insufficient grounding in environment-specific knowledge, and (2) single-LLM planning conflates generation with self-verification, resulting in overconfident yet unreliable subgoals that frequently fail during execution. To address these challenges, we propose Subgoal Graph-Augmented Actor-Critic-Refiner (SGA-ACR), a framework that integrates an environment-specific subgoal graph and structured entity knowledge with a multi-LLM planning pipeline that explicitly separates generation, critique, and refinement to produce executable and verifiable subgoals. A subgoal tracker further monitors execution progress, provides auxiliary rewards, and adaptively updates the subgoal graph to maintain alignment between plans and actions. Experimental results on 22 diverse tasks in the open-world game "Crafter" demonstrate the effectiveness of our proposed method.

SYNov 22, 2025
Sparse Kalman Identification for Partially Observable Systems via Adaptive Bayesian Learning

Jilan Mei, Tengjie Zheng, Lin Cheng et al.

Sparse dynamics identification is an essential tool for discovering interpretable physical models and enabling efficient control in engineering systems. However, existing methods rely on batch learning with full historical data, limiting their applicability to real-time scenarios involving sequential and partially observable data. To overcome this limitation, this paper proposes an online Sparse Kalman Identification (SKI) method by integrating the Augmented Kalman Filter (AKF) and Automatic Relevance Determination (ARD). The main contributions are: (1) a theoretically grounded Bayesian sparsification scheme that is seamlessly integrated into the AKF framework and adapted to sequentially collected data in online scenarios; (2) an update mechanism that adapts the Kalman posterior to reflect the updated selection of the basis functions that define the model structure; (3) an explicit gradient-descent formulation that enhances computational efficiency. Consequently, the SKI method achieves accurate model structure selection with millisecond-level efficiency and higher identification accuracy, as demonstrated by extensive simulations and real-world experiments (showing an 84.21\% improvement in accuracy over the baseline AKF).

MLOct 17, 2025
Recursive Inference for Heterogeneous Multi-Output GP State-Space Models with Arbitrary Moment Matching

Tengjie Zheng, Jilan Mei, Di Wu et al.

Accurate learning of system dynamics is becoming increasingly crucial for advanced control and decision-making in engineering. However, real-world systems often exhibit multiple channels and highly nonlinear transition dynamics, challenging traditional modeling methods. To enable online learning for these systems, this paper formulates the system as Gaussian process state-space models (GPSSMs) and develops a recursive learning method. The main contributions are threefold. First, a heterogeneous multi-output kernel is designed, allowing each output dimension to adopt distinct kernel types, hyperparameters, and input variables, improving expressiveness in multi-dimensional dynamics learning. Second, an inducing-point management algorithm enhances computational efficiency through independent selection and pruning for each output dimension. Third, a unified recursive inference framework for GPSSMs is derived, supporting general moment matching approaches, including the extended Kalman filter (EKF), unscented Kalman filter (UKF), and assumed density filtering (ADF), enabling accurate learning under strong nonlinearity and significant noise. Experiments on synthetic and real-world datasets show that the proposed method matches the accuracy of SOTA offline GPSSMs with only 1/100 of the runtime, and surpasses SOTA online GPSSMs by around 70% in accuracy under heavy noise while using only 1/20 of the runtime.

LGAug 5, 2025
BubbleOKAN: A Physics-Informed Interpretable Neural Operator for High-Frequency Bubble Dynamics

Yunhao Zhang, Sidharth S. Menon, Lin Cheng et al.

In this work, we employ physics-informed neural operators to map pressure profiles from an input function space to the corresponding bubble radius responses. Our approach employs a two-step DeepONet architecture. To address the intrinsic spectral bias of deep learning models, our model incorporates the Rowdy adaptive activation function, enhancing the representation of high-frequency features. Moreover, we introduce the Kolmogorov-Arnold network (KAN) based two-step DeepOKAN model, which enhances interpretability (often lacking in conventional multilayer perceptron architectures) while efficiently capturing high-frequency bubble dynamics without explicit utilization of activation functions in any form. We particularly investigate the use of spline basis functions in combination with radial basis functions (RBF) within our architecture, as they demonstrate superior performance in constructing a universal basis for approximating high-frequency bubble dynamics compared to alternative formulations. Furthermore, we emphasize on the performance bottleneck of RBF while learning the high frequency bubble dynamics and showcase the advantage of using spline basis function for the trunk network in overcoming this inherent spectral bias. The model is systematically evaluated across three representative scenarios: (1) bubble dynamics governed by the Rayleigh-Plesset equation with a single initial radius, (2) bubble dynamics governed by the Keller-Miksis equation with a single initial radius, and (3) Keller-Miksis dynamics with multiple initial radii. We also compare our results with state-of-the-art neural operators, including Fourier Neural Operators, Wavelet Neural Operators, OFormer, and Convolutional Neural Operators. Our findings demonstrate that the two-step DeepOKAN accurately captures both low- and high-frequency behaviors, and offers a promising alternative to conventional numerical solvers.

OCJun 13, 2025
Quantum Learning and Estimation for Distribution Networks and Energy Communities Coordination

Yingrui Zhuang, Lin Cheng, Yuji Cao et al.

Price signals from distribution networks (DNs) guide energy communities (ECs) to adjust energy usage, enabling effective coordination for reliable power system operation. However, this coordination faces significant challenges due to the limited availability of information (i.e., only the aggregated energy usage of ECs is available to DNs), and the high computational burden of accounting for uncertainties and the associated risks through numerous scenarios. To address these challenges, we propose a quantum learning and estimation approach to enhance coordination between DNs and ECs. Specifically, leveraging advanced quantum properties such as quantum superposition and entanglement, we develop a hybrid quantum temporal convolutional network-long short-term memory (Q-TCN-LSTM) model to establish an end-to-end mapping between ECs' responses and the price incentives from DNs. Moreover, we develop a quantum estimation method based on quantum amplitude estimation (QAE) and two phase-rotation circuits to significantly accelerate the optimization process under numerous uncertainty scenarios. Numerical experiments demonstrate that, compared to classical neural networks, the proposed Q-TCN-LSTM model improves the mapping accuracy by 69.2% while reducing the model size by 99.75% and the computation time by 93.9%. Compared to classical Monte Carlo simulation, QAE achieves comparable accuracy with a dramatic reduction in computational time (up to 99.99%) and requires significantly fewer computational resources.

LGFeb 4, 2025
Adviser-Actor-Critic: Eliminating Steady-State Error in Reinforcement Learning Control

Donghe Chen, Yubin Peng, Tengjie Zheng et al.

High-precision control tasks present substantial challenges for reinforcement learning (RL) algorithms, frequently resulting in suboptimal performance attributed to network approximation inaccuracies and inadequate sample quality.These issues are exacerbated when the task requires the agent to achieve a precise goal state, as is common in robotics and other real-world applications.We introduce Adviser-Actor-Critic (AAC), designed to address the precision control dilemma by combining the precision of feedback control theory with the adaptive learning capability of RL and featuring an Adviser that mentors the actor to refine control actions, thereby enhancing the precision of goal attainment.Finally, through benchmark tests, AAC outperformed standard RL algorithms in precision-critical, goal-conditioned tasks, demonstrating AAC's high precision, reliability, and robustness.Code are available at: https://anonymous.4open.science/r/Adviser-Actor-Critic-8AC5.

LGJan 18, 2025
Stability Enhancement in Reinforcement Learning via Adaptive Control Lyapunov Function

Donghe Chen, Han Wang, Lin Cheng et al.

Reinforcement Learning (RL) has shown promise in control tasks but faces significant challenges in real-world applications, primarily due to the absence of safety guarantees during the learning process. Existing methods often struggle with ensuring safe exploration, leading to potential system failures and restricting applications primarily to simulated environments. Traditional approaches such as reward shaping and constrained policy optimization can fail to guarantee safety during initial learning stages, while model-based methods using Control Lyapunov Functions (CLFs) or Control Barrier Functions (CBFs) may hinder efficient exploration and performance. To address these limitations, this paper introduces Soft Actor-Critic with Control Lyapunov Function (SAC-CLF), a framework that enhances stability and safety through three key innovations: (1) a task-specific CLF design method for safe and optimal performance; (2) dynamic adjustment of constraints to maintain robustness under unmodeled dynamics; and (3) improved control input smoothness while ensuring safety. Experimental results on a classical nonlinear system and satellite attitude control demonstrate the effectiveness of SAC-CLF in overcoming the shortcomings of existing methods.

LGNov 22, 2024
Recursive Gaussian Process State Space Model

Tengjie Zheng, Haipeng Chen, Lin Cheng et al.

Learning dynamical models from data is not only fundamental but also holds great promise for advancing principle discovery, time-series prediction, and controller design. Among various approaches, Gaussian Process State-Space Models (GPSSMs) have recently gained significant attention due to their combination of flexibility and interpretability. However, for online learning, the field lacks an efficient method suitable for scenarios where prior information regarding data distribution and model function is limited. To address this issue, this paper proposes a recursive GPSSM method with adaptive capabilities for both operating domains and Gaussian process (GP) hyperparameters. Specifically, we first utilize first-order linearization to derive a Bayesian update equation for the joint distribution between the system state and the GP model, enabling closed-form and domain-independent learning. Second, an online selection algorithm for inducing points is developed based on informative criteria to achieve lightweight learning. Third, to support online hyperparameter optimization, we recover historical measurement information from the current filtering distribution. Comprehensive evaluations on both synthetic and real-world datasets demonstrate the superior accuracy, computational efficiency, and adaptability of our method compared to state-of-the-art online GPSSM techniques.

MLOct 13, 2021
Reinforcement Learning for Standards Design

Shahrukh Khan Kasi, Sayandev Mukherjee, Lin Cheng et al.

Communications standards are designed via committees of humans holding repeated meetings over months or even years until consensus is achieved. This includes decisions regarding the modulation and coding schemes to be supported over an air interface. We propose a way to "automate" the selection of the set of modulation and coding schemes to be supported over a given air interface and thereby streamline both the standards design process and the ease of extending the standard to support new modulation schemes applicable to new higher-level applications and services. Our scheme involves machine learning, whereby a constructor entity submits proposals to an evaluator entity, which returns a score for the proposal. The constructor employs reinforcement learning to iterate on its submitted proposals until a score is achieved that was previously agreed upon by both constructor and evaluator to be indicative of satisfying the required design criteria (including performance metrics for transmissions over the interface).

CVNov 11, 2020
GRCNN: Graph Recognition Convolutional Neural Network for Synthesizing Programs from Flow Charts

Lin Cheng, Zijiang Yang

Program synthesis is the task to automatically generate programs based on user specification. In this paper, we present a framework that synthesizes programs from flow charts that serve as accurate and intuitive specifications. In order doing so, we propose a deep neural network called GRCNN that recognizes graph structure from its image. GRCNN is trained end-to-end, which can predict edge and node information of the flow chart simultaneously. Experiments show that the accuracy rate to synthesize a program is 66.4%, and the accuracy rates to recognize edge and nodes are 94.1% and 67.9%, respectively. On average, it takes about 60 milliseconds to synthesize a program.

IVAug 8, 2020
Exploring the parameter reusability of CNN

Wei Wang, Lin Cheng, Yanjie Zhu et al.

In recent times, using small data to train networks has become a hot topic in the field of deep learning. Reusing pre-trained parameters is one of the most important strategies to address the issue of semi-supervised and transfer learning. However, the fundamental reason for the success of these methods is still unclear. In this paper, we propose a solution that can not only judge whether a given network is reusable or not based on the performance of reusing convolution kernels but also judge which layers' parameters of the given network can be reused, based on the performance of reusing corresponding parameters and, ultimately, judge whether those parameters are reusable or not in a target task based on the root mean square error (RMSE) of the corresponding convolution kernels. Specifically, we define that the success of a CNN's parameter reuse depends upon two conditions: first, the network is a reusable network; and second, the RMSE between the convolution kernels from the source domain and target domain is small enough. The experimental results demonstrate that the performance of reused parameters applied to target tasks, when these conditions are met, is significantly improved.

CVFeb 20, 2020
Learning Object Scale With Click Supervision for Object Detection

Liao Zhang, Yan Yan, Lin Cheng et al.

Weakly-supervised object detection has recently attracted increasing attention since it only requires image-levelannotations. However, the performance obtained by existingmethods is still far from being satisfactory compared with fully-supervised object detection methods. To achieve a good trade-off between annotation cost and object detection performance,we propose a simple yet effective method which incorporatesCNN visualization with click supervision to generate the pseudoground-truths (i.e., bounding boxes). These pseudo ground-truthscan be used to train a fully-supervised detector. To estimatethe object scale, we firstly adopt a proposal selection algorithmto preserve high-quality proposals, and then generate ClassActivation Maps (CAMs) for these preserved proposals by theproposed CNN visualization algorithm called Spatial AttentionCAM. Finally, we fuse these CAMs together to generate pseudoground-truths and train a fully-supervised object detector withthese ground-truths. Experimental results on the PASCAL VOC2007 and VOC 2012 datasets show that the proposed methodcan obtain much higher accuracy for estimating the object scale,compared with the state-of-the-art image-level based methodsand the center-click based method

CVJun 17, 2019
Hallucinated Adversarial Learning for Robust Visual Tracking

Qiangqiang Wu, Zhihui Chen, Lin Cheng et al.

Humans can easily learn new concepts from just a single exemplar, mainly due to their remarkable ability to imagine or hallucinate what the unseen exemplar may look like in different settings. Incorporating such an ability to hallucinate diverse new samples of the tracked instance can help the trackers alleviate the over-fitting problem in the low-data tracking regime. To achieve this, we propose an effective adversarial approach, denoted as adversarial "hallucinator" (AH), for robust visual tracking. The proposed AH is designed to firstly learn transferable non-linear deformations between a pair of same-identity instances, and then apply these deformations to an unseen tracked instance in order to generate diverse positive training samples. By incorporating AH into an online tracking-by-detection framework, we propose the hallucinated adversarial tracker (HAT), which jointly optimizes AH with an online classifier (e.g., MDNet) in an end-to-end manner. In addition, a novel selective deformation transfer (SDT) method is presented to better select the deformations which are more suitable for transfer. Extensive experiments on 3 popular benchmarks demonstrate that our HAT achieves the state-of-the-art performance.

CVJan 24, 2019
A Novel Self-Intersection Penalty Term for Statistical Body Shape Models and Its Applications in 3D Pose Estimation

Zaiqiang Wu, Wei Jiang, Hao Luo et al.

Statistical body shape models are widely used in 3D pose estimation due to their low-dimensional parameters representation. However, it is difficult to avoid self-intersection between body parts accurately. Motivated by this fact, we proposed a novel self-intersection penalty term for statistical body shape models applied in 3D pose estimation. To avoid the trouble of computing self-intersection for complex surfaces like the body meshes, the gradient of our proposed self-intersection penalty term is manually derived from the perspective of geometry. First, the self-intersection penalty term is defined as the volume of the self-intersection region. To calculate the partial derivatives with respect to the coordinates of the vertices, we employed detection rays to divide vertices of statistical body shape models into different groups depending on whether the vertex is in the region of self-intersection. Second, the partial derivatives could be easily derived by the normal vectors of neighboring triangles of the vertices. Finally, this penalty term could be applied in gradient-based optimization algorithms to remove the self-intersection of triangular meshes without using any approximation. Qualitative and quantitative evaluations were conducted to demonstrate the effectiveness and generality of our proposed method compared with previous approaches. The experimental results show that our proposed penalty term can avoid self-intersection to exclude unreasonable predictions and improves the accuracy of 3D pose estimation indirectly. Further more, the proposed method could be employed universally in triangular mesh based 3D reconstruction.

CVJul 19, 2018
Deep Adaptive Proposal Network for Object Detection in Optical Remote Sensing Images

Lin Cheng, Xu Liu, Lingling Li et al.

Object detection is a fundamental and challenging problem in aerial and satellite image analysis. More recently, a two-stage detector Faster R-CNN is proposed and demonstrated to be a promising tool for object detection in optical remote sensing images, while the sparse and dense characteristic of objects in remote sensing images is complexity. It is unreasonable to treat all images with the same region proposal strategy, and this treatment limits the performance of two-stage detectors. In this paper, we propose a novel and effective approach, named deep adaptive proposal network (DAPNet), address this complexity characteristic of object by learning a new category prior network (CPN) on the basis of the existing Faster R-CNN architecture. Moreover, the candidate regions produced by DAPNet model are different from the traditional region proposal network (RPN), DAPNet predicts the detail category of each candidate region. And these candidate regions combine the object number, which generated by the category prior network to achieve a suitable number of candidate boxes for each image. These candidate boxes can satisfy detection tasks in sparse and dense scenes. The performance of the proposed framework has been evaluated on the challenging NWPU VHR-10 data set. Experimental results demonstrate the superiority of the proposed framework to the state-of-the-art.