Jianwei Zhao

CV
h-index21
4papers
225citations
Novelty55%
AI Score36

4 Papers

SYSep 23, 2017
Beam Tracking for UAV Mounted SatCom on-the-Move with Massive Antenna Array

Jianwei Zhao, Feifei Gao, Qihui Wu et al.

Unmanned aerial vehicle (UAV)-satellite communication has drawn dramatic attention for its potential to build the integrated space-air-ground network and the seamless wide-area coverage. The key challenge to UAV-satellite communication is its unstable beam pointing due to the UAV navigation, which is a typical SatCom on-the-move scenario. In this paper, we propose a blind beam tracking approach for Ka-band UAVsatellite communication system, where UAV is equipped with a large-scale antenna array. The effects of UAV navigation are firstly released through the mechanical adjustment, which could approximately point the beam towards the target satellite through beam stabilization and dynamic isolation. Specially, the attitude information can be realtimely derived from data fusion of lowcost sensors. Then, the precision of the beam pointing is blindly refined through electrically adjusting the weight of the massive antennas, where an array structure based simultaneous perturbation algorithm is designed. Simulation results are provided to demonstrate the superiority of the proposed method over the existing ones.

CVJul 18, 2024
FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection

Jianwei Zhao, Xin Li, Fan Yang et al.

Detecting objects seamlessly blended into their surroundings represents a complex task for both human cognitive capabilities and advanced artificial intelligence algorithms. Currently, the majority of methodologies for detecting camouflaged objects mainly focus on utilizing discriminative models with various unique designs. However, it has been observed that generative models, such as Stable Diffusion, possess stronger capabilities for understanding various objects in complex environments; Yet their potential for the cognition and detection of camouflaged objects has not been extensively explored. In this study, we present a novel denoising diffusion model, namely FocusDiffuser, to investigate how generative models can enhance the detection and interpretation of camouflaged objects. We believe that the secret to spotting camouflaged objects lies in catching the subtle nuances in details. Consequently, our FocusDiffuser innovatively integrates specialized enhancements, notably the Boundary-Driven LookUp (BDLU) module and Cyclic Positioning (CP) module, to elevate standard diffusion models, significantly boosting the detail-oriented analytical capabilities. Our experiments demonstrate that FocusDiffuser, from a generative perspective, effectively addresses the challenge of camouflaged object detection, surpassing leading models on benchmarks like CAMO, COD10K and NC4K.

CVMar 17, 2022
Co-visual pattern augmented generative transformer learning for automobile geo-localization

Jianwei Zhao, Qiang Zhai, Pengbo Zhao et al.

Geolocation is a fundamental component of route planning and navigation for unmanned vehicles, but GNSS-based geolocation fails under denial-of-service conditions. Cross-view geo-localization (CVGL), which aims to estimate the geographical location of the ground-level camera by matching against enormous geo-tagged aerial (\emph{e.g.}, satellite) images, has received lots of attention but remains extremely challenging due to the drastic appearance differences across aerial-ground views. In existing methods, global representations of different views are extracted primarily using Siamese-like architectures, but their interactive benefits are seldom taken into account. In this paper, we present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL), for CVGL. Specifically, by taking the initial representations produced by the backbone network, MGTL develops two separate generative sub-modules -- one for aerial-aware knowledge generation from ground-view semantics and vice versa -- and fully exploits the entirely mutual benefits through the attention mechanism. Moreover, to better capture the co-visual relationships between aerial and ground views, we introduce a cascaded attention masking algorithm to further boost accuracy. Extensive experiments on challenging public benchmarks, \emph{i.e.}, {CVACT} and {CVUSA}, demonstrate the effectiveness of the proposed method which sets new records compared with the existing state-of-the-art models.

CVMar 16, 2025
MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification

Jianwei Zhao, Xin Li, Fan Yang et al.

Whole Slide Image (WSI) classification poses unique challenges due to the vast image size and numerous non-informative regions, which introduce noise and cause data imbalance during feature aggregation. To address these issues, we propose MExD, an Expert-Infused Diffusion Model that combines the strengths of a Mixture-of-Experts (MoE) mechanism with a diffusion model for enhanced classification. MExD balances patch feature distribution through a novel MoE-based aggregator that selectively emphasizes relevant information, effectively filtering noise, addressing data imbalance, and extracting essential features. These features are then integrated via a diffusion-based generative process to directly yield the class distribution for the WSI. Moving beyond conventional discriminative approaches, MExD represents the first generative strategy in WSI classification, capturing fine-grained details for robust and precise results. Our MExD is validated on three widely-used benchmarks-Camelyon16, TCGA-NSCLC, and BRACS consistently achieving state-of-the-art performance in both binary and multi-class tasks.