Zhili Zhang

CV
h-index32
10papers
240citations
Novelty56%
AI Score47

10 Papers

CVMay 21, 2022Code
Boosting Camouflaged Object Detection with Dual-Task Interactive Transformer

Zhengyi Liu, Zhili Zhang, Wei Wu

Camouflaged object detection intends to discover the concealed objects hidden in the surroundings. Existing methods follow the bio-inspired framework, which first locates the object and second refines the boundary. We argue that the discovery of camouflaged objects depends on the recurrent search for the object and the boundary. The recurrent processing makes the human tired and helpless, but it is just the advantage of the transformer with global search ability. Therefore, a dual-task interactive transformer is proposed to detect both accurate position of the camouflaged object and its detailed boundary. The boundary feature is considered as Query to improve the camouflaged object detection, and meanwhile the object feature is considered as Query to improve the boundary detection. The camouflaged object detection and the boundary detection are fully interacted by multi-head self-attention. Besides, to obtain the initial object feature and boundary feature, transformer-based backbones are adopted to extract the foreground and background. The foreground is just object, while foreground minus background is considered as boundary. Here, the boundary feature can be obtained from blurry boundary region of the foreground and background. Supervised by the object, the background and the boundary ground truth, the proposed model achieves state-of-the-art performance in public datasets. https://github.com/liuzywen/COD

CVMar 25, 2023
Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation

Sanbao Su, Songyang Han, Yiming Li et al.

Object detection and multiple object tracking (MOT) are essential components of self-driving systems. Accurate detection and uncertainty quantification are both critical for onboard modules, such as perception, prediction, and planning, to improve the safety and robustness of autonomous vehicles. Collaborative object detection (COD) has been proposed to improve detection accuracy and reduce uncertainty by leveraging the viewpoints of multiple agents. However, little attention has been paid to how to leverage the uncertainty quantification from COD to enhance MOT performance. In this paper, as the first attempt to address this challenge, we design an uncertainty propagation framework called MOT-CUP. Our framework first quantifies the uncertainty of COD through direct modeling and conformal prediction, and propagates this uncertainty information into the motion prediction and association steps. MOT-CUP is designed to work with different collaborative object detectors and baseline MOT algorithms. We evaluate MOT-CUP on V2X-Sim, a comprehensive collaborative perception dataset, and demonstrate a 2% improvement in accuracy and a 2.67X reduction in uncertainty compared to the baselines, e.g. SORT and ByteTrack. In scenarios characterized by high occlusion levels, our MOT-CUP demonstrates a noteworthy $4.01\%$ improvement in accuracy. MOT-CUP demonstrates the importance of uncertainty quantification in both COD and MOT, and provides the first attempt to improve the accuracy and reduce the uncertainty in MOT based on COD through uncertainty propagation. Our code is public on https://coperception.github.io/MOT-CUP/.

AIJun 11, 2023
Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications

Jiangwei Wang, Shuo Yang, Ziyan An et al.

Reward design is a key component of deep reinforcement learning, yet some tasks and designer's objectives may be unnatural to define as a scalar cost function. Among the various techniques, formal methods integrated with DRL have garnered considerable attention due to their expressiveness and flexibility to define the reward and requirements for different states and actions of the agent. However, how to leverage Signal Temporal Logic (STL) to guide multi-agent reinforcement learning reward design remains unexplored. Complex interactions, heterogeneous goals and critical safety requirements in multi-agent systems make this problem even more challenging. In this paper, we propose a novel STL-guided multi-agent reinforcement learning framework. The STL requirements are designed to include both task specifications according to the objective of each agent and safety specifications, and the robustness values of the STL specifications are leveraged to generate rewards. We validate the advantages of our method through empirical studies. The experimental results demonstrate significant reward performance improvements compared to MARL without STL guidance, along with a remarkable increase in the overall safety rate of the multi-agent systems.

56.1ROMay 12
Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware

Keshawn Smith, Zhili Zhang, H M Sabbir Ahmad et al.

Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.

ROOct 5, 2022
Spatial-Temporal-Aware Safe Multi-Agent Reinforcement Learning of Connected Autonomous Vehicles in Challenging Scenarios

Zhili Zhang, Songyang Han, Jiangwei Wang et al.

Communication technologies enable coordination among connected and autonomous vehicles (CAVs). However, it remains unclear how to utilize shared information to improve the safety and efficiency of the CAV system in dynamic and complicated driving scenarios. In this work, we propose a framework of constrained multi-agent reinforcement learning (MARL) with a parallel Safety Shield for CAVs in challenging driving scenarios that includes unconnected hazard vehicles. The coordination mechanisms of the proposed MARL include information sharing and cooperative policy learning, with Graph Convolutional Network (GCN)-Transformer as a spatial-temporal encoder that enhances the agent's environment awareness. The Safety Shield module with Control Barrier Functions (CBF)-based safety checking protects the agents from taking unsafe actions. We design a constrained multi-agent advantage actor-critic (CMAA2C) algorithm to train safe and cooperative policies for CAVs. With the experiment deployed in the CARLA simulator, we verify the performance of the safety checking, spatial-temporal encoder, and coordination mechanisms designed in our method by comparative experiments in several challenging scenarios with unconnected hazard vehicles. Results show that our proposed methodology significantly increases system safety and efficiency in challenging scenarios.

CVSep 1, 2025Code
SegAssess: Panoramic quality mapping for robust and transferable unsupervised segmentation assessment

Bingnan Yang, Mi Zhang, Zhili Zhang et al.

High-quality image segmentation is fundamental to pixel-level geospatial analysis in remote sensing, necessitating robust segmentation quality assessment (SQA), particularly in unsupervised settings lacking ground truth. Although recent deep learning (DL) based unsupervised SQA methods show potential, they often suffer from coarse evaluation granularity, incomplete assessments, and poor transferability. To overcome these limitations, this paper introduces Panoramic Quality Mapping (PQM) as a new paradigm for comprehensive, pixel-wise SQA, and presents SegAssess, a novel deep learning framework realizing this approach. SegAssess distinctively formulates SQA as a fine-grained, four-class panoramic segmentation task, classifying pixels within a segmentation mask under evaluation into true positive (TP), false positive (FP), true negative (TN), and false negative (FN) categories, thereby generating a complete quality map. Leveraging an enhanced Segment Anything Model (SAM) architecture, SegAssess uniquely employs the input mask as a prompt for effective feature integration via cross-attention. Key innovations include an Edge Guided Compaction (EGC) branch with an Aggregated Semantic Filter (ASF) module to refine predictions near challenging object edges, and an Augmented Mixup Sampling (AMS) training strategy integrating multi-source masks to significantly boost cross-domain robustness and zero-shot transferability. Comprehensive experiments across 32 datasets derived from 6 sources demonstrate that SegAssess achieves state-of-the-art (SOTA) performance and exhibits remarkable zero-shot transferability to unseen masks, establishing PQM via SegAssess as a robust and transferable solution for unsupervised SQA. The code is available at https://github.com/Yangbn97/SegAssess.

CVJan 3, 2025
Adaptive Homophily Clustering: Structure Homophily Graph Learning with Adaptive Filter for Hyperspectral Image

Yao Ding, Weijie Kang, Aitao Yang et al.

Hyperspectral image (HSI) clustering has been a fundamental but challenging task with zero training labels. Currently, some deep graph clustering methods have been successfully explored for HSI due to their outstanding performance in effective spatial structural information encoding. Nevertheless, insufficient structural information utilization, poor feature presentation ability, and weak graph update capability limit their performance. Thus, in this paper, a homophily structure graph learning with an adaptive filter clustering method (AHSGC) for HSI is proposed. Specifically, homogeneous region generation is first developed for HSI processing and constructing the original graph. Afterward, an adaptive filter graph encoder is designed to adaptively capture the high and low frequency features on the graph for subsequence processing. Then, a graph embedding clustering self-training decoder is developed with KL Divergence, with which the pseudo-label is generated for network training. Meanwhile, homophily-enhanced structure learning is introduced to update the graph according to the clustering task, in which the orient correlation estimation is adopted to estimate the node connection, and graph edge sparsification is designed to adjust the edges in the graph dynamically. Finally, a joint network optimization is introduced to achieve network self-training and update the graph. The K-means is adopted to express the latent features. Extensive experiments and repeated comparative analysis have verified that our AHSGC contains high clustering accuracy, low computational complexity, and strong robustness. The code source will be available at https://github.com/DY-HYX.

ROFeb 28, 2025
AuthSim: Towards Authentic and Effective Safety-critical Scenario Generation for Autonomous Driving Tests

Yukuan Yang, Xucheng Lu, Zhili Zhang et al.

Generating adversarial safety-critical scenarios is a pivotal method for testing autonomous driving systems, as it identifies potential weaknesses and enhances system robustness and reliability. However, existing approaches predominantly emphasize unrestricted collision scenarios, prompting non-player character (NPC) vehicles to attack the ego vehicle indiscriminately. These works overlook these scenarios' authenticity, rationality, and relevance, resulting in numerous extreme, contrived, and largely unrealistic collision events involving aggressive NPC vehicles. To rectify this issue, we propose a three-layer relative safety region model, which partitions the area based on danger levels and increases the likelihood of NPC vehicles entering relative boundary regions. This model directs NPC vehicles to engage in adversarial actions within relatively safe boundary regions, thereby augmenting the scenarios' authenticity. We introduce AuthSim, a comprehensive platform for generating authentic and effective safety-critical scenarios by integrating the three-layer relative safety region model with reinforcement learning. To our knowledge, this is the first attempt to address the authenticity and effectiveness of autonomous driving system test scenarios comprehensively. Extensive experiments demonstrate that AuthSim outperforms existing methods in generating effective safety-critical scenarios. Notably, AuthSim achieves a 5.25% improvement in average cut-in distance and a 27.12% enhancement in average collision interval time, while maintaining higher efficiency in generating effective safety-critical scenarios compared to existing methods. This underscores its significant advantage in producing authentic scenarios over current methodologies.

CVJan 18, 2024
Enhanced Automated Quality Assessment Network for Interactive Building Segmentation in High-Resolution Remote Sensing Imagery

Zhili Zhang, Xiangyun Hu, Jiabo Xu

In this research, we introduce the enhanced automated quality assessment network (IBS-AQSNet), an innovative solution for assessing the quality of interactive building segmentation within high-resolution remote sensing imagery. This is a new challenge in segmentation quality assessment, and our proposed IBS-AQSNet allievate this by identifying missed and mistaken segment areas. First of all, to acquire robust image features, our method combines a robust, pre-trained backbone with a lightweight counterpart for comprehensive feature extraction from imagery and segmentation results. These features are then fused through a simple combination of concatenation, convolution layers, and residual connections. Additionally, ISR-AQSNet incorporates a multi-scale differential quality assessment decoder, proficient in pinpointing areas where segmentation result is either missed or mistaken. Experiments on a newly-built EVLab-BGZ dataset, which includes over 39,198 buildings, demonstrate the superiority of the proposed method in automating segmentation quality assessment, thereby setting a new benchmark in the field.

AIAug 12, 2020
Deceptive Kernel Function on Observations of Discrete POMDP

Zhili Zhang, Quanyan Zhu

This paper studies the deception applied on agent in a partially observable Markov decision process. We introduce deceptive kernel function (the kernel) applied to agent's observations in a discrete POMDP. Based on value iteration, value function approximation and POMCP three characteristic algorithms used by agent, we analyze its belief being misled by falsified observations as the kernel's outputs and anticipate its probable threat on agent's reward and potentially other performance. We validate our expectation and explore more detrimental effects of the deception by experimenting on two POMDP problems. The result shows that the kernel applied on agent's observation can affect its belief and substantially lower its resulting rewards; meantime certain implementation of the kernel could induce other abnormal behaviors by the agent.