Zhong Liu

CV
h-index41
27papers
551citations
Novelty48%
AI Score54

27 Papers

CVSep 25, 2023Code
IEBins: Iterative Elastic Bins for Monocular Depth Estimation

Shuwei Shao, Zhongcai Pei, Xingming Wu et al.

Monocular depth estimation (MDE) is a fundamental topic of geometric computer vision and a core technique for many downstream applications. Recently, several methods reframe the MDE as a classification-regression problem where a linear combination of probabilistic distribution and bin centers is used to predict depth. In this paper, we propose a novel concept of iterative elastic bins (IEBins) for the classification-regression-based MDE. The proposed IEBins aims to search for high-quality depth by progressively optimizing the search range, which involves multiple stages and each stage performs a finer-grained depth search in the target bin on top of its previous stage. To alleviate the possible error accumulation during the iterative process, we utilize a novel elastic target bin to replace the original target bin, the width of which is adjusted elastically based on the depth uncertainty. Furthermore, we develop a dedicated framework composed of a feature extractor and an iterative optimizer that has powerful temporal context modeling capabilities benefiting from the GRU-based architecture. Extensive experiments on the KITTI, NYU-Depth-v2 and SUN RGB-D datasets demonstrate that the proposed method surpasses prior state-of-the-art competitors. The source code is publicly available at https://github.com/ShuweiShao/IEBins.

CVFeb 16, 2023Code
URCDC-Depth: Uncertainty Rectified Cross-Distillation with CutFlip for Monocular Depth Estimation

Shuwei Shao, Zhongcai Pei, Weihai Chen et al.

This work aims to estimate a high-quality depth map from a single RGB image. Due to the lack of depth clues, making full use of the long-range correlation and the local information is critical for accurate depth estimation. Towards this end, we introduce an uncertainty rectified cross-distillation between Transformer and convolutional neural network (CNN) to learn a unified depth estimator. Specifically, we use the depth estimates from the Transformer branch and the CNN branch as pseudo labels to teach each other. Meanwhile, we model the pixel-wise depth uncertainty to rectify the loss weights of noisy pseudo labels. To avoid the large capacity gap induced by the strong Transformer branch deteriorating the cross-distillation, we transfer the feature maps from Transformer to CNN and design coupling units to assist the weak CNN branch to leverage the transferred features. Furthermore, we propose a surprisingly simple yet highly effective data augmentation technique CutFlip, which enforces the model to exploit more valuable clues apart from the vertical image position for depth inference. Extensive experiments demonstrate that our model, termed~\textbf{URCDC-Depth}, exceeds previous state-of-the-art methods on the KITTI, NYU-Depth-v2 and SUN RGB-D datasets, even with no additional computational burden at inference time. The source code is publicly available at \url{https://github.com/ShuweiShao/URCDC-Depth}.

LGAug 16, 2023Code
The Expressive Power of Graph Neural Networks: A Survey

Bingxu Zhang, Changjun Fan, Shixuan Liu et al.

Graph neural networks (GNNs) are effective machine learning models for many graph-related applications. Despite their empirical success, many research efforts focus on the theoretical limitations of GNNs, i.e., the GNNs expressive power. Early works in this domain mainly focus on studying the graph isomorphism recognition ability of GNNs, and recent works try to leverage the properties such as subgraph counting and connectivity learning to characterize the expressive power of GNNs, which are more practical and closer to real-world. However, no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a first survey for models for enhancing expressive power under different forms of definition. Concretely, the models are reviewed based on three categories, i.e., Graph feature enhancement, Graph topology enhancement, and GNNs architecture enhancement.

CVJun 21, 2023Code
Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering

Lin Xi, Weihai Chen, Xingming Wu et al.

Online unsupervised video object segmentation (UVOS) uses the previous frames as its input to automatically separate the primary object(s) from a streaming video without using any further manual annotation. A major challenge is that the model has no access to the future and must rely solely on the history, i.e., the segmentation mask is predicted from the current frame as soon as it is captured. In this work, a novel contrastive motion clustering algorithm with an optical flow as its input is proposed for the online UVOS by exploiting the common fate principle that visual elements tend to be perceived as a group if they possess the same motion pattern. We build a simple and effective auto-encoder to iteratively summarize non-learnable prototypical bases for the motion pattern, while the bases in turn help learn the representation of the embedding network. Further, a contrastive learning strategy based on a boundary prior is developed to improve foreground and background feature discrimination in the representation learning stage. The proposed algorithm can be optimized on arbitrarily-scale data i.e., frame, clip, dataset) and performed in an online fashion. Experiments on $\textit{DAVIS}_{\textit{16}}$, $\textit{FBMS}$, and $\textit{SegTrackV2}$ datasets show that the accuracy of our method surpasses the previous state-of-the-art (SoTA) online UVOS method by a margin of 0.8%, 2.9%, and 1.1%, respectively. Furthermore, by using an online deep subspace clustering to tackle the motion grouping, our method is able to achieve higher accuracy at $3\times$ faster inference time compared to SoTA online UVOS method, and making a good trade-off between effectiveness and efficiency. Our code is available at https://github.com/xilin1991/ClusterNet.

SYMay 30
Lipschitz-Enforced Machine Learning Framework for Accelerating Transient Stability Analysis of Networked Grid-Interactive Inverters

Zhong Liu, Jialin Zheng, Xiaonan Lu

The growing penetration of grid-connected inverters renders Transient Stability Analysis (TSA) increasingly challenging in modern power systems. Existing TSA methodologies encounter an intrinsic trade-off between accuracy and scalability when dealing with these networked inverter-based resources (IBRs). To bridge this gap, this paper proposes a Lipschitz-enforced machine learning framework that leverages Lipschitz continuity to restructure the transient stability certification mechanism. By replacing computationally intensive verification procedures with a deterministic and efficient algebraic check, the proposed method enables rigorous stability guarantees for complex multi-inverter systems, effectively bypassing the complexity limits of traditional analytical approximations. Validated on networked Grid-Forming (GFM) inverter systems, the proposed framework accelerates the training process by over 5 times compared to existing methods. Notably, the proposed framework substantially outperforms traditional transient stability analysis approaches (e.g., Linear Matrix Inequality and Sum-of-Squares methods) by capturing up to 30\% larger Regions of Attraction (ROA), effectively shattering the conservativeness bottleneck that has long constrained traditional analytical tools. This advancement provides a scalable and theoretically rigorous solution for the TSA of networked IBRs in modern power grids.

CVFeb 20, 2023
Self-Supervised Monocular Depth Estimation with Self-Reference Distillation and Disparity Offset Refinement

Zhong Liu, Ran Li, Shuwei Shao et al.

Monocular depth estimation plays a fundamental role in computer vision. Due to the costly acquisition of depth ground truth, self-supervised methods that leverage adjacent frames to establish a supervisory signal have emerged as the most promising paradigms. In this work, we propose two novel ideas to improve self-supervised monocular depth estimation: 1) self-reference distillation and 2) disparity offset refinement. Specifically, we use a parameter-optimized model as the teacher updated as the training epochs to provide additional supervision during the training process. The teacher model has the same structure as the student model, with weights inherited from the historical student model. In addition, a multiview check is introduced to filter out the outliers produced by the teacher model. Furthermore, we leverage the contextual consistency between high-scale and low-scale features to obtain multiscale disparity offsets, which are used to refine the disparity output incrementally by aligning disparity information at different scales. The experimental results on the KITTI and Make3D datasets show that our method outperforms previous state-of-the-art competitors.

CVApr 6, 2022
Implicit Motion-Compensated Network for Unsupervised Video Object Segmentation

Lin Xi, Weihai Chen, Xingming Wu et al.

Unsupervised video object segmentation (UVOS) aims at automatically separating the primary foreground object(s) from the background in a video sequence. Existing UVOS methods either lack robustness when there are visually similar surroundings (appearance-based) or suffer from deterioration in the quality of their predictions because of dynamic background and inaccurate flow (flow-based). To overcome the limitations, we propose an implicit motion-compensated network (IMCNet) combining complementary cues ($\textit{i.e.}$, appearance and motion) with aligned motion information from the adjacent frames to the current frame at the feature level without estimating optical flows. The proposed IMCNet consists of an affinity computing module (ACM), an attention propagation module (APM), and a motion compensation module (MCM). The light-weight ACM extracts commonality between neighboring input frames based on appearance features. The APM then transmits global correlation in a top-down manner. Through coarse-to-fine iterative inspiring, the APM will refine object regions from multiple resolutions so as to efficiently avoid losing details. Finally, the MCM aligns motion information from temporally adjacent frames to the current frame which achieves implicit motion compensation at the feature level. We perform extensive experiments on $\textit{DAVIS}_{\textit{16}}$ and $\textit{YouTube-Objects}$. Our network achieves favorable performance while running at a faster speed compared to the state-of-the-art methods.

AIJul 8, 2023
Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks

Shixuan Liu, Changjun Fan, Kewei Cheng et al.

Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.

CVMay 30, 2022
Learnable Patchmatch and Self-Teaching for Multi-Frame Depth Estimation in Monocular Endoscopy

Shuwei Shao, Zhongcai Pei, Weihai Chen et al.

This work delves into unsupervised monocular depth estimation in endoscopy, which leverages adjacent frames to establish a supervisory signal during the training phase. For many clinical applications, e.g., surgical navigation, temporally correlated frames are also available at test time. Due to the lack of depth clues, making full use of the temporal correlation among multiple video frames at both phases is crucial for accurate depth estimation. However, several challenges in endoscopic scenes, such as low and homogeneous textures and inter-frame brightness fluctuations, limit the performance gain from the temporal correlation. To fully exploit it, we propose a novel unsupervised multi-frame monocular depth estimation model. The proposed model integrates a learnable patchmatch module to adaptively increase the discriminative ability in regions with low and homogeneous textures, and enforces cross-teaching and self-teaching consistencies to provide efficacious regularizations towards brightness fluctuations. Furthermore, as a byproduct of the self-teaching paradigm, the proposed model is able to improve the depth predictions when more frames are input at test time. We conduct detailed experiments on multiple datasets, including SCARED, EndoSLAM, Hamlyn and SERV-CT. The experimental results indicate that our model exceeds the state-of-the-art competitors. The source code and trained models will be publicly available upon the acceptance.

CVNov 20, 2022
Real-time Local Feature with Global Visual Information Enhancement

Jinyu Miao, Haosong Yue, Zhong Liu et al.

Local feature provides compact and invariant image representation for various visual tasks. Current deep learning-based local feature algorithms always utilize convolution neural network (CNN) architecture with limited receptive field. Besides, even with high-performance GPU devices, the computational efficiency of local features cannot be satisfactory. In this paper, we tackle such problems by proposing a CNN-based local feature algorithm. The proposed method introduces a global enhancement module to fuse global visual clues in a light-weight network, and then optimizes the network by novel deep reinforcement learning scheme from the perspective of local feature matching task. Experiments on the public benchmarks demonstrate that the proposal can achieve considerable robustness against visual interference and meanwhile run in real time.

CVAug 24, 2025Code
E-BayesSAM: Efficient Bayesian Adaptation of SAM with Self-Optimizing KAN-Based Interpretation for Uncertainty-Aware Ultrasonic Segmentation

Bin Huang, Zhong Liu, Huiying Wen et al.

Although the Segment Anything Model (SAM) has advanced medical image segmentation, its Bayesian adaptation for uncertainty-aware segmentation remains hindered by three key issues: (1) instability in Bayesian fine-tuning of large pre-trained SAMs; (2) high computation cost due to SAM's massive parameters; (3) SAM's black-box design limits interpretability. To overcome these, we propose E-BayesSAM, an efficient framework combining Token-wise Variational Bayesian Inference (T-VBI) for efficienty Bayesian adaptation and Self-Optimizing Kolmogorov-Arnold Network (SO-KAN) for improving interpretability. T-VBI innovatively reinterprets SAM's output tokens as dynamic probabilistic weights and reparameterizes them as latent variables without auxiliary training, enabling training-free VBI for uncertainty estimation. SO-KAN improves token prediction with learnable spline activations via self-supervised learning, providing insight to prune redundant tokens to boost efficiency and accuracy. Experiments on five ultrasound datasets demonstrated that E-BayesSAM achieves: (i) real-time inference (0.03s/image), (ii) superior segmentation accuracy (average DSC: Pruned E-BayesSAM's 89.0\% vs. E-BayesSAM's 88.0% vs. MedSAM's 88.3%), and (iii) identification of four critical tokens governing SAM's decisions. By unifying efficiency, reliability, and interpretability, E-BayesSAM bridges SAM's versatility with clinical needs, advancing deployment in safety-critical medical applications. The source code is available at https://github.com/mp31192/E-BayesSAM.

LGDec 27, 2024
Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization

Shixuan Liu, Yanghe Feng, Keyu Wu et al.

In many domains of empirical sciences, discovering the causal structure within variables remains an indispensable task. Recently, to tackle with unoriented edges or latent assumptions violation suffered by conventional methods, researchers formulated a reinforcement learning (RL) procedure for causal discovery, and equipped REINFORCE algorithm to search for the best-rewarded directed acyclic graph. The two keys to the overall performance of the procedure are the robustness of RL methods and the efficient encoding of variables. However, on the one hand, REINFORCE is prone to local convergence and unstable performance during training. Neither trust region policy optimization, being computationally-expensive, nor proximal policy optimization (PPO), suffering from aggregate constraint deviation, is decent alternative for combinatory optimization problems with considerable individual subactions. We propose a trust region-navigated clipping policy optimization method for causal discovery that guarantees both better search efficiency and steadiness in policy optimization, in comparison with REINFORCE, PPO and our prioritized sampling-guided REINFORCE implementation. On the other hand, to boost the efficient encoding of variables, we propose a refined graph attention encoder called SDGAT that can grasp more feature information without priori neighbourhood information. With these improvements, the proposed method outperforms former RL method in both synthetic and benchmark datasets in terms of output results and optimization robustness.

LGOct 22, 2025
Environment Inference for Learning Generalizable Dynamical System

Shixuan Liu, Yue He, Haotian Wang et al.

Data-driven methods offer efficient and robust solutions for analyzing complex dynamical systems but rely on the assumption of I.I.D. data, driving the development of generalization techniques for handling environmental differences. These techniques, however, are limited by their dependence on environment labels, which are often unavailable during training due to data acquisition challenges, privacy concerns, and environmental variability, particularly in large public datasets and privacy-sensitive domains. In response, we propose DynaInfer, a novel method that infers environment specifications by analyzing prediction errors from fixed neural networks within each training round, enabling environment assignments directly from data. We prove our algorithm effectively solves the alternating optimization problem in unlabeled scenarios and validate it through extensive experiments across diverse dynamical systems. Results show that DynaInfer outperforms existing environment assignment techniques, converges rapidly to true labels, and even achieves superior performance when environment labels are available.

LGDec 13, 2023
Machine Learning for the Multi-Dimensional Bin Packing Problem: Literature Review and Empirical Evaluation

Wenjie Wu, Changjun Fan, Jincai Huang et al.

The Bin Packing Problem (BPP) is a well-established combinatorial optimization (CO) problem. Since it has many applications in our daily life, e.g. logistics and resource allocation, people are seeking efficient bin packing algorithms. On the other hand, researchers have been making constant advances in machine learning (ML), which is famous for its efficiency. In this article, we first formulate BPP, introducing its variants and practical constraints. Then, a comprehensive survey on ML for multi-dimensional BPP is provided. We further collect some public benchmarks of 3D BPP, and evaluate some online methods on the Cutting Stock Dataset. Finally, we share our perspective on challenges and future directions in BPP. To the best of our knowledge, this is the first systematic review of ML-related methods for BPP.

AIJul 7, 2025
Rule Learning for Knowledge Graph Reasoning under Agnostic Distribution Shift

Shixuan Liu, Yue He, Yunfei Wang et al.

Logical rule learning, a prominent category of knowledge graph (KG) reasoning methods, constitutes a critical research area aimed at learning explicit rules from observed facts to infer missing knowledge. However, like all KG reasoning methods, rule learning suffers from a critical weakness-its dependence on the I.I.D. assumption. This assumption can easily be violated due to selection bias during training or agnostic distribution shifts during testing (e.g., as in query shift scenarios), ultimately undermining model performance and reliability. To enable robust KG reasoning in wild environments, this study investigates logical rule learning in the presence of agnostic test-time distribution shifts. We formally define this challenge as out-of-distribution (OOD) KG reasoning-a previously underexplored problem, and propose the Stable Rule Learning (StableRule) framework as a solution. StableRule is an end-to-end framework that combines feature decorrelation with rule learning network, to enhance OOD generalization in KG reasoning. By leveraging feature decorrelation, StableRule mitigates the adverse effects of covariate shifts arising in OOD scenarios, improving the robustness of the rule learning network. Extensive experiments on seven benchmark KGs demonstrate the framework's superior effectiveness and stability across diverse heterogeneous environments, highlighting its practical significance for real-world applications.

AIMay 11, 2025
A Multi-Agent Reinforcement Learning Approach for Cooperative Air-Ground-Human Crowdsensing in Emergency Rescue

Wenhao Lu, Zhengqiu Zhu, Yong Zhao et al.

Mobile crowdsensing is evolving beyond traditional human-centric models by integrating heterogeneous entities like unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs). Optimizing task allocation among these diverse agents is critical, particularly in challenging emergency rescue scenarios characterized by complex environments, limited communication, and partial observability. This paper tackles the Heterogeneous-Entity Collaborative-Sensing Task Allocation (HECTA) problem specifically for emergency rescue, considering humans, UAVs, and UGVs. We introduce a novel ``Hard-Cooperative'' policy where UGVs prioritize recharging low-battery UAVs, alongside performing their sensing tasks. The primary objective is maximizing the task completion rate (TCR) under strict time constraints. We rigorously formulate this NP-hard problem as a decentralized partially observable Markov decision process (Dec-POMDP) to effectively handle sequential decision-making under uncertainty. To solve this, we propose HECTA4ER, a novel multi-agent reinforcement learning algorithm built upon a Centralized Training with Decentralized Execution architecture. HECTA4ER incorporates tailored designs, including specialized modules for complex feature extraction, utilization of action-observation history via hidden states, and a mixing network integrating global and local information, specifically addressing the challenges of partial observability. Furthermore, theoretical analysis confirms the algorithm's convergence properties. Extensive simulations demonstrate that HECTA4ER significantly outperforms baseline algorithms, achieving an average 18.42% increase in TCR. Crucially, a real-world case study validates the algorithm's effectiveness and robustness in dynamic sensing scenarios, highlighting its strong potential for practical application in emergency response.

CLNov 15, 2024
A Survey of Event Causality Identification: Taxonomy, Challenges, Assessment, and Prospects

Qing Cheng, Zefan Zeng, Xingchen Hu et al.

Event Causality Identification (ECI) has become an essential task in Natural Language Processing (NLP), focused on automatically detecting causal relationships between events within texts. This comprehensive survey systematically investigates fundamental concepts and models, developing a systematic taxonomy and critically evaluating diverse models. We begin by defining core concepts, formalizing the ECI problem, and outlining standard evaluation protocols. Our classification framework divides ECI models into two primary tasks: Sentence-level Event Causality Identification (SECI) and Document-level Event Causality Identification (DECI). For SECI, we review models employing feature pattern-based matching, machine learning classifiers, deep semantic encoding, prompt-based fine-tuning, and causal knowledge pre-training, alongside data augmentation strategies. For DECI, we focus on approaches utilizing deep semantic encoding, event graph reasoning, and prompt-based fine-tuning. Special attention is given to recent advancements in multi-lingual and cross-lingual ECI, as well as zero-shot ECI leveraging Large Language Models (LLMs). We analyze the strengths, limitations, and unresolved challenges associated with each approach. Extensive quantitative evaluations are conducted on four benchmark datasets to rigorously assess the performance of various ECI models. We conclude by discussing future research directions and highlighting opportunities to advance the field further.

LGApr 6, 2024
Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning

Tianle Pu, Changjun Fan, Mutian Shen et al.

Many complex problems encountered in both production and daily life can be conceptualized as combinatorial optimization problems (COPs) over graphs. Recent years, reinforcement learning (RL) based models have emerged as a promising direction, which treat the COPs solving as a heuristic learning problem. However, current finite-horizon-MDP based RL models have inherent limitations. They are not allowed to explore adquately for improving solutions at test time, which may be necessary given the complexity of NP-hard optimization tasks. Some recent attempts solve this issue by focusing on reward design and state feature engineering, which are tedious and ad-hoc. In this work, we instead propose a much simpler but more effective technique, named gauge transformation (GT). The technique is originated from physics, but is very effective in enabling RL agents to explore to continuously improve the solutions during test. Morever, GT is very simple, which can be implemented with less than 10 lines of Python codes, and can be applied to a vast majority of RL models. Experimentally, we show that traditional RL models with GT technique produce the state-of-the-art performances on the MaxCut problem. Furthermore, since GT is independent of any RL models, it can be seamlessly integrated into various RL frameworks, paving the way of these models for more effective explorations in the solving of general COPs.

AIFeb 4, 2024
Conversational Crowdsensing: A Parallel Intelligence Powered Novel Sensing Approach

Zhengqiu Zhu, Yong Zhao, Bin Chen et al.

The transition from CPS-based Industry 4.0 to CPSS-based Industry 5.0 brings new requirements and opportunities to current sensing approaches, especially in light of recent progress in Chatbots and Large Language Models (LLMs). Therefore, the advancement of parallel intelligence-powered Crowdsensing Intelligence (CSI) is witnessed, which is currently advancing towards linguistic intelligence. In this paper, we propose a novel sensing paradigm, namely conversational crowdsensing, for Industry 5.0. It can alleviate workload and professional requirements of individuals and promote the organization and operation of diverse workforce, thereby facilitating faster response and wider popularization of crowdsensing systems. Specifically, we design the architecture of conversational crowdsensing to effectively organize three types of participants (biological, robotic, and digital) from diverse communities. Through three levels of effective conversation (i.e., inter-human, human-AI, and inter-AI), complex interactions and service functionalities of different workers can be achieved to accomplish various tasks across three sensing phases (i.e., requesting, scheduling, and executing). Moreover, we explore the foundational technologies for realizing conversational crowdsensing, encompassing LLM-based multi-agent systems, scenarios engineering and conversational human-AI cooperation. Finally, we present potential industrial applications of conversational crowdsensing and discuss its implications. We envision that conversations in natural language will become the primary communication channel during crowdsensing process, enabling richer information exchange and cooperative problem-solving among humans, robots, and AI.

CVJan 18, 2022
Attentional Feature Refinement and Alignment Network for Aircraft Detection in SAR Imagery

Yan Zhao, Lingjun Zhao, Zhong Liu et al.

Aircraft detection in Synthetic Aperture Radar (SAR) imagery is a challenging task in SAR Automatic Target Recognition (SAR ATR) areas due to aircraft's extremely discrete appearance, obvious intraclass variation, small size and serious background's interference. In this paper, a single-shot detector namely Attentional Feature Refinement and Alignment Network (AFRAN) is proposed for detecting aircraft in SAR images with competitive accuracy and speed. Specifically, three significant components including Attention Feature Fusion Module (AFFM), Deformable Lateral Connection Module (DLCM) and Anchor-guided Detection Module (ADM), are carefully designed in our method for refining and aligning informative characteristics of aircraft. To represent characteristics of aircraft with less interference, low-level textural and high-level semantic features of aircraft are fused and refined in AFFM throughly. The alignment between aircraft's discrete back-scatting points and convolutional sampling spots is promoted in DLCM. Eventually, the locations of aircraft are predicted precisely in ADM based on aligned features revised by refined anchors. To evaluate the performance of our method, a self-built SAR aircraft sliced dataset and a large scene SAR image are collected. Extensive quantitative and qualitative experiments with detailed analysis illustrate the effectiveness of the three proposed components. Furthermore, the topmost detection accuracy and competitive speed are achieved by our method compared with other domain-specific,e.g., DAPN, PADN, and general CNN-based methods,e.g., FPN, Cascade R-CNN, SSD, RefineDet and RPDet.

IVDec 25, 2021
DSRGAN: Detail Prior-Assisted Perceptual Single Image Super-Resolution via Generative Adversarial Networks

Ziyang Liu, Zhengguo Li, Xingming Wu et al.

The generative adversarial network (GAN) is successfully applied to study the perceptual single image superresolution (SISR). However, the GAN often tends to generate images with high frequency details being inconsistent with the real ones. Inspired by conventional detail enhancement algorithms, we propose a novel prior knowledge, the detail prior, to assist the GAN in alleviating this problem and restoring more realistic details. The proposed method, named DSRGAN, includes a well designed detail extraction algorithm to capture the most important high frequency information from images. Then, two discriminators are utilized for supervision on image-domain and detail-domain restorations, respectively. The DSRGAN merges the restored detail into the final output via a detail enhancement manner. The special design of DSRGAN takes advantages from both the model-based conventional algorithm and the data-driven deep learning network. Experimental results demonstrate that the DSRGAN outperforms the state-of-the-art SISR methods on perceptual metrics and achieves comparable results in terms of fidelity metrics simultaneously. Following the DSRGAN, it is feasible to incorporate other conventional image processing algorithms into a deep learning network to form a model-based deep SISR.

LGDec 7, 2021
PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay

Xingxing Liang, Yang Ma, Yanghe Feng et al.

On-policy deep reinforcement learning algorithms have low data utilization and require significant experience for policy improvement. This paper proposes a proximal policy optimization algorithm with prioritized trajectory replay (PTR-PPO) that combines on-policy and off-policy methods to improve sampling efficiency by prioritizing the replay of trajectories generated by old policies. We first design three trajectory priorities based on the characteristics of trajectories: the first two being max and mean trajectory priorities based on one-step empirical generalized advantage estimation (GAE) values and the last being reward trajectory priorities based on normalized undiscounted cumulative reward. Then, we incorporate the prioritized trajectory replay into the PPO algorithm, propose a truncated importance weight method to overcome the high variance caused by large importance weights under multistep experience, and design a policy improvement loss function for PPO under off-policy conditions. We evaluate the performance of PTR-PPO in a set of Atari discrete control tasks, achieving state-of-the-art performance. In addition, by analyzing the heatmap of priority changes at various locations in the priority memory during training, we find that memory size and rollout length can have a significant impact on the distribution of trajectory priorities and, hence, on the performance of the algorithm.

CVNov 16, 2021
Towards Comprehensive Monocular Depth Estimation: Multiple Heads Are Better Than One

Shuwei Shao, Ran Li, Zhongcai Pei et al.

Depth estimation attracts widespread attention in the computer vision community. However, it is still quite difficult to recover an accurate depth map using only one RGB image. We observe a phenomenon that existing methods tend to fail in different cases, caused by differences in network architecture, loss function and so on. In this work, we investigate into the phenomenon and propose to integrate the strengths of multiple weak depth predictor to build a comprehensive and accurate depth predictor, which is critical for many real-world applications, e.g., 3D reconstruction. Specifically, we construct multiple base (weak) depth predictors by utilizing different Transformer-based and convolutional neural network (CNN)-based architectures. Transformer establishes long-range correlation while CNN preserves local information ignored by Transformer due to the spatial inductive bias. Therefore, the coupling of Transformer and CNN contributes to the generation of complementary depth estimates, which are essential to achieve a comprehensive depth predictor. Then, we design mixers to learn from multiple weak predictions and adaptively fuse them into a strong depth estimate. The resultant model, which we refer to as Transformer-assisted depth ensembles (TEDepth). On the standard NYU-Depth-v2 and KITTI datasets, we thoroughly explore how the neural ensembles affect the depth estimation and demonstrate that our TEDepth achieves better results than previous state-of-the-art approaches. To validate the generalizability across cameras, we directly apply the models trained on NYU-Depth-v2 to the SUN RGB-D dataset without any fine-tuning, and the superior results emphasize its strong generalizability.

CVMar 18, 2021
Discriminative and Semantic Feature Selection for Place Recognition towards Dynamic Environments

Yuxin Tian, Jinyu MIao, Xingming Wu et al.

Features play an important role in various visual tasks, especially in visual place recognition applied in perceptual changing environments. In this paper, we address the challenges of place recognition due to dynamics and confusable patterns by proposing a discriminative and semantic feature selection network, dubbed as DSFeat. Supervised by both semantic information and attention mechanism, we can estimate pixel-wise stability of features, indicating the probability of a static and stable region from which features are extracted, and then select features that are insensitive to dynamic interference and distinguishable to be correctly matched. The designed feature selection model is evaluated in place recognition and SLAM system in several public datasets with varying appearances and viewpoints. Experimental results conclude that the effectiveness of the proposed method. It should be noticed that our proposal can be readily pluggable into any feature-based SLAM system.

CVJul 22, 2019
Extended Local Binary Patterns for Efficient and Robust Spontaneous Facial Micro-Expression Recognition

Chengyu Guo, Jingyun Liang, Geng Zhan et al.

Facial Micro-Expressions (MEs) are spontaneous, involuntary facial movements when a person experiences an emotion but deliberately or unconsciously attempts to conceal his or her genuine emotions. Recently, ME recognition has attracted increasing attention due to its potential applications such as clinical diagnosis, business negotiation, interrogations, and security. However, it is expensive to build large scale ME datasets, mainly due to the difficulty of inducing spontaneous MEs. This limits the application of deep learning techniques which require lots of training data. In this paper, we propose a simple, efficient yet robust descriptor called Extended Local Binary Patterns on Three Orthogonal Planes (ELBPTOP) for ME recognition. ELBPTOP consists of three complementary binary descriptors: LBPTOP and two novel ones Radial Difference LBPTOP (RDLBPTOP) and Angular Difference LBPTOP (ADLBPTOP), which explore the local second order information along the radial and angular directions contained in ME video sequences. ELBPTOP is a novel ME descriptor inspired by unique and subtle facial movements. It is computationally efficient and only marginally increases the cost of computing LBPTOP, yet is extremely effective for ME recognition. In addition, by firstly introducing Whitened Principal Component Analysis (WPCA) to ME recognition, we can further obtain more compact and discriminative feature representations, then achieve significantly computational savings. Extensive experimental evaluation on three popular spontaneous ME datasets SMIC, CASME II and SAMM show that our proposed ELBPTOP approach significantly outperforms the previous state-of-the-art on all three single evaluated datasets and achieves promising results on cross-database recognition.Our code will be made available.

SIMay 24, 2019
Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network Approach

Changjun Fan, Li Zeng, Yuhui Ding et al.

Betweenness centrality (BC) is one of the most used centrality measures for network analysis, which seeks to describe the importance of nodes in a network in terms of the fraction of shortest paths that pass through them. It is key to many valuable applications, including community detection and network dismantling. Computing BC scores on large networks is computationally challenging due to high time complexity. Many approximation algorithms have been proposed to speed up the estimation of BC, which are mainly sampling-based. However, these methods are still prone to considerable execution time on large-scale networks, and their results are often exacerbated when small changes happen to the network structures. In this paper, we focus on identifying nodes with high BC in a graph, since many application scenarios are built upon retrieving nodes with top-k BC. Different from previous heuristic methods, we turn this task into a learning problem and design an encoder-decoder based framework to resolve the problem. More specifcally, the encoder leverages the network structure to encode each node into an embedding vector, which captures the important structural information of the node. The decoder transforms the embedding vector for each node into a scalar, which captures the relative rank of this node in terms of BC. We use the pairwise ranking loss to train the model to identify the orders of nodes regarding their BC. By training on small-scale networks, the learned model is capable of assigning relative BC scores to nodes for any unseen networks, and thus identifying the highly-ranked nodes. Comprehensive experiments on both synthetic and real-world networks demonstrate that, compared to representative baselines, our model drastically speeds up the prediction without noticeable sacrifce in accuracy, and outperforms the state-of-the-art by accuracy on several large real-world networks.

LGDec 24, 2018
VMAV-C: A Deep Attention-based Reinforcement Learning Algorithm for Model-based Control

Xingxing Liang, Qi Wang, Yanghe Feng et al.

Recent breakthroughs in Go play and strategic games have witnessed the great potential of reinforcement learning in intelligently scheduling in uncertain environment, but some bottlenecks are also encountered when we generalize this paradigm to universal complex tasks. Among them, the low efficiency of data utilization in model-free reinforcement algorithms is of great concern. In contrast, the model-based reinforcement learning algorithms can reveal underlying dynamics in learning environments and seldom suffer the data utilization problem. To address the problem, a model-based reinforcement learning algorithm with attention mechanism embedded is proposed as an extension of World Models in this paper. We learn the environment model through Mixture Density Network Recurrent Network(MDN-RNN) for agents to interact, with combinations of variational auto-encoder(VAE) and attention incorporated in state value estimates during the process of learning policy. In this way, agent can learn optimal policies through less interactions with actual environment, and final experiments demonstrate the effectiveness of our model in control problem.