Xichao Teng

CV
h-index14
6papers
189citations
Novelty35%
AI Score45

6 Papers

CVFeb 21, 2023
Oriented object detection in optical remote sensing images using deep learning: a survey

Kun Wang, Zi Wang, Zhang Li et al.

Oriented object detection is a fundamental yet challenging task in remote sensing (RS), aiming to locate and classify objects with arbitrary orientations. Recent advancements in deep learning have significantly enhanced the capabilities of oriented object detection methods. Given the rapid development of this field, a comprehensive survey of the recent advances in oriented object detection is presented in this paper. Specifically, we begin by tracing the technical evolution from horizontal object detection to oriented object detection and highlighting the specific related challenges, including feature misalignment, spatial misalignment, oriented bounding box (OBB) regression problems, and common issues encountered in RS. Subsequently, we further categorize the existing methods into detection frameworks, OBB regression techniques, feature representation approaches, and solutions to common issues and provide an in-depth discussion of how these methods address the above challenges. In addition, we cover several publicly available datasets and evaluation protocols. Furthermore, we provide a comprehensive comparison and analysis involving the state-of-the-art methods. Toward the end of this paper, we identify several future directions for oriented object detection research.

CVMar 12, 2025Code
Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark

Yibin Ye, Xichao Teng, Shuo Chen et al.

Absolute Visual Localization (AVL) enables Unmanned Aerial Vehicle (UAV) to determine its position in GNSS-denied environments by establishing geometric relationships between UAV images and geo-tagged reference maps. While many previous works have achieved AVL with image retrieval and matching techniques, research in low-altitude multi-view scenarios still remains limited. Low-altitude Multi-view condition presents greater challenges due to extreme viewpoint changes. To explore the best UAV AVL approach in such condition, we proposed this benchmark. Firstly, a large-scale Low-altitude Multi-view dataset called AnyVisLoc was constructed. This dataset includes 18,000 images captured at multiple scenes and altitudes, along with 2.5D reference maps containing aerial photogrammetry maps and historical satellite maps. Secondly, a unified framework was proposed to integrate the state-of-the-art AVL approaches and comprehensively test their performance. The best combined method was chosen as the baseline and the key factors that influencing localization accuracy are thoroughly analyzed based on it. This baseline achieved a 74.1% localization accuracy within 5m under Low-altitude, Multi-view conditions. In addition, a novel retrieval metric called PDM@K was introduced to better align with the characteristics of the UAV AVL task. Overall, this benchmark revealed the challenges of Low-altitude, Multi-view UAV AVL and provided valuable guidance for future research. The dataset and codes are available at https://github.com/UAV-AVL/Benchmark

62.6CVApr 7Code
CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration

Xuecong Liu, Mengzhu Ding, Zixuan Sun et al.

We present Consistent-Recurrent Feature Flow Transformer (CRFT), a unified coarse-to-fine framework based on feature flow learning for robust cross-modal image registration. CRFT learns a modality-independent feature flow representation within a transformer-based architecture that jointly performs feature alignment and flow estimation. The coarse stage establishes global correspondences through multi-scale feature correlation, while the fine stage refines local details via hierarchical feature fusion and adaptive spatial reasoning. To enhance geometric adaptability, an iterative discrepancy-guided attention mechanism with a Spatial Geometric Transform (SGT) recurrently refines the flow field, progressively capturing subtle spatial inconsistencies and enforcing feature-level consistency. This design enables accurate alignment under large affine and scale variations while maintaining structural coherence across modalities. Extensive experiments on diverse cross-modal datasets demonstrate that CRFT consistently outperforms state-of-the-art registration methods in both accuracy and robustness. Beyond registration, CRFT provides a generalizable paradigm for multimodal spatial correspondence, offering broad applicability to remote sensing, autonomous navigation, and medical imaging. Code and datasets are publicly available at https://github.com/NEU-Liuxuecong/CRFT.

CVApr 1, 2024Code
3MOS: Multi-sources, Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching

Yibin Ye, Xichao Teng, Shuo Chen et al.

Optical-SAR image matching is a fundamental task for image fusion and visual navigation. However, all large-scale open SAR dataset for methods development are collected from single platform, resulting in limited satellite types and spatial resolutions. Since images captured by different sensors vary significantly in both geometric and radiometric appearance, existing methods may fail to match corresponding regions containing the same content. Besides, most of existing datasets have not been categorized based on the characteristics of different scenes. To encourage the design of more general multi-modal image matching methods, we introduce a large-scale Multi-sources,Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching(3MOS). It consists of 155K optical-SAR image pairs, including SAR data from six commercial satellites, with resolutions ranging from 1.25m to 12.5m. The data has been classified into eight scenes including urban, rural, plains, hills, mountains, water, desert, and frozen earth. Extensively experiments show that none of state-of-the-art methods achieve consistently superior performance across different sources, resolutions and scenes. In addition, the distribution of data has a substantial impact on the matching capability of deep learning models, this proposes the domain adaptation challenge in optical-SAR image matching. Our data and code will be available at:https://github.com/3M-OS/3MOS.

CVMar 8
Scale-Aware UAV-to-Satellite Cross-View Geo-Localization: A Semantic Geometric Approach

Yibin Ye, Shuo Chen, Kun Wang et al.

Cross-View Geo-Localization (CVGL) between UAV imagery and satellite images plays a crucial role in target localization and UAV self-positioning. However, most existing methods rely on the idealized assumption of scale consistency between UAV queries and satellite galleries, overlooking the severe scale ambiguity commonly encountered in real-world scenarios. This discrepancy leads to field-of-view misalignment and feature mismatch, significantly degrading CVGL robustness. To address this issue, we propose a geometric framework that recovers the absolute metric scale from monocular UAV images using semantic anchors. Specifically, small vehicles (SVs), characterized by relatively stable prior size distributions and high detectability, are exploited as metric references. A Decoupled Stereoscopic Projection Model is introduced to estimate the absolute image scale from these semantic targets. By decomposing vehicle dimensions into radial and tangential components, the model compensates for perspective distortions in 2D detections of 3D vehicles, enabling more accurate scale estimation. To further reduce intra-class size variation and detection noise, a dual-dimension fusion strategy with Interquartile Range (IQR)-based robust aggregation is employed. The estimated global scale is then used as a physical constraint for scale-adaptive satellite image cropping, improving UAV-to-satellite feature alignment. Experiments on augmented DenseUAV and UAV-VisLoc datasets demonstrate that the proposed method significantly improves CVGL robustness under unknown UAV image scales. Additionally, the framework shows strong potential for downstream applications such as passive UAV altitude estimation and 3D model scale recovery.

IVAug 21, 2020
Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019

Zhang Li, Jiehua Zhang, Tao Tan et al.

Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using the false positive rate, false negative rate, and DICE coefficient (DC). The DC ranged from 0.7354$\pm$0.1149 to 0.8372$\pm$0.0858. The DC of the best method was close to the inter-observer agreement (0.8398$\pm$0.0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ($\textit{p}$<$0.01$) than single model methods, with mean DC of 0.7966 and 0.7544, respectively. Deep learning based methods could potentially help pathologists find suspicious regions for further analysis of lung cancer in WSI.