h-index19
35papers
498citations
Novelty48%
AI Score55

35 Papers

CVAug 6, 2024Code
Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera

Zibin Liu, Banglei Guan, Yang Shang et al.

Pose estimation and tracking of objects is a fundamental application in 3D vision. Event cameras possess remarkable attributes such as high dynamic range, low latency, and resilience against motion blur, which enables them to address challenging high dynamic range scenes or high-speed motion. These features make event cameras an ideal complement over standard cameras for object pose estimation. In this work, we propose a line-based robust pose estimation and tracking method for planar or non-planar objects using an event camera. Firstly, we extract object lines directly from events, then provide an initial pose using a globally-optimal Branch-and-Bound approach, where 2D-3D line correspondences are not known in advance. Subsequently, we utilize event-line matching to establish correspondences between 2D events and 3D models. Furthermore, object poses are refined and continuously tracked by minimizing event-line distances. Events are assigned different weights based on these distances, employing robust estimation algorithms. To evaluate the precision of the proposed methods in object pose estimation and tracking, we have devised and established an event-based moving object dataset. Compared against state-of-the-art methods, the robustness and accuracy of our methods have been validated both on synthetic experiments and the proposed dataset. The source code is available at https://github.com/Zibin6/LOPET.

CVDec 27, 2025Code
LECalib: Line-Based Event Camera Calibration

Zibin Liu, Banglei Guan, Yang Shang et al.

Camera calibration is an essential prerequisite for event-based vision applications. Current event camera calibration methods typically involve using flashing patterns, reconstructing intensity images, and utilizing the features extracted from events. Existing methods are generally time-consuming and require manually placed calibration objects, which cannot meet the needs of rapidly changing scenarios. In this paper, we propose a line-based event camera calibration framework exploiting the geometric lines of commonly-encountered objects in man-made environments, e.g., doors, windows, boxes, etc. Different from previous methods, our method detects lines directly from event streams and leverages an event-line calibration model to generate the initial guess of camera parameters, which is suitable for both planar and non-planar lines. Then, a non-linear optimization is adopted to refine camera parameters. Both simulation and real-world experiments have demonstrated the feasibility and accuracy of our method, with validation performed on monocular and stereo event cameras. The source code is released at https://github.com/Zibin6/line_based_event_camera_calib.

CVDec 23, 2022
Bridging the Domain Gap in Satellite Pose Estimation: a Self-Training Approach based on Geometrical Constraints

Zi Wang, Minglin Chen, Yulan Guo et al.

Recently, unsupervised domain adaptation in satellite pose estimation has gained increasing attention, aiming at alleviating the annotation cost for training deep models. To this end, we propose a self-training framework based on the domain-agnostic geometrical constraints. Specifically, we train a neural network to predict the 2D keypoints of a satellite and then use PnP to estimate the pose. The poses of target samples are regarded as latent variables to formulate the task as a minimization problem. Furthermore, we leverage fine-grained segmentation to tackle the information loss issue caused by abstracting the satellite as sparse keypoints. Finally, we iteratively solve the minimization problem in two steps: pseudo-label generation and network training. Experimental results show that our method adapts well to the target domain. Moreover, our method won the 1st place on the sunlamp task of the second international Satellite Pose Estimation Competition.

CVJul 1, 2024
Small Aerial Target Detection for Airborne Infrared Detection Systems using LightGBM and Trajectory Constraints

Xiaoliang Sun, Liangchao Guo, Wenlong Zhang et al.

Factors, such as rapid relative motion, clutter background, etc., make robust small aerial target detection for airborne infrared detection systems a challenge. Existing methods are facing difficulties when dealing with such cases. We consider that a continuous and smooth trajectory is critical in boosting small infrared aerial target detection performance. A simple and effective small aerial target detection method for airborne infrared detection system using light gradient boosting model (LightGBM) and trajectory constraints is proposed in this article. First, we simply formulate target candidate detection as a binary classification problem. Target candidates in every individual frame are detected via interesting pixel detection and a trained LightGBM model. Then, the local smoothness and global continuous characteristic of the target trajectory are modeled as short-strict and long-loose constraints. The trajectory constraints are used efficiently for detecting the true small infrared aerial targets from numerous target candidates. Experiments on public datasets demonstrate that the proposed method performs better than other existing methods. Furthermore, a public dataset for small aerial target detection in airborne infrared detection systems is constructed. To the best of our knowledge, this dataset has the largest data scale and richest scene types within this field.

CVFeb 26Code
GSTurb: Gaussian Splatting for Atmospheric Turbulence Mitigation

Hanliang Du, Zhangji Lu, Zewei Cai et al.

Atmospheric turbulence causes significant image degradation due to pixel displacement (tilt) and blur, particularly in long-range imaging applications. In this paper, we propose a novel framework for atmospheric turbulence mitigation, GSTurb, which integrates optical flow-guided tilt correction and Gaussian splatting for modeling non-isoplanatic blur. The framework employs Gaussian parameters to represent tilt and blur, and optimizes them across multiple frames to enhance restoration. Experimental results on the ATSyn-static dataset demonstrate the effectiveness of our method, achieving a peak PSNR of 27.67 dB and SSIM of 0.8735. Compared to the state-of-the-art method, GSTurb improves PSNR by 1.3 dB (a 4.5% increase) and SSIM by 0.048 (a 5.8% increase). Additionally, on real datasets, including the TSRWGAN Real-World and CLEAR datasets, GSTurb outperforms existing methods, showing significant improvements in both qualitative and quantitative performance. These results highlight that combining optical flow-guided tilt correction with Gaussian splatting effectively enhances image restoration under both synthetic and real-world turbulence conditions. The code for this method will be available at https://github.com/DuhlLiamz/3DGS_turbulence/tree/main.

CVFeb 21, 2023
Oriented object detection in optical remote sensing images using deep learning: a survey

Kun Wang, Zi Wang, Zhang Li et al.

Oriented object detection is a fundamental yet challenging task in remote sensing (RS), aiming to locate and classify objects with arbitrary orientations. Recent advancements in deep learning have significantly enhanced the capabilities of oriented object detection methods. Given the rapid development of this field, a comprehensive survey of the recent advances in oriented object detection is presented in this paper. Specifically, we begin by tracing the technical evolution from horizontal object detection to oriented object detection and highlighting the specific related challenges, including feature misalignment, spatial misalignment, oriented bounding box (OBB) regression problems, and common issues encountered in RS. Subsequently, we further categorize the existing methods into detection frameworks, OBB regression techniques, feature representation approaches, and solutions to common issues and provide an in-depth discussion of how these methods address the above challenges. In addition, we cover several publicly available datasets and evaluation protocols. Furthermore, we provide a comprehensive comparison and analysis involving the state-of-the-art methods. Toward the end of this paper, we identify several future directions for oriented object detection research.

CVDec 18, 2025Code
Flexible Camera Calibration using a Collimator System

Shunkun Liang, Banglei Guan, Zhenbao Yu et al.

Camera calibration is a crucial step in photogrammetry and 3D vision applications. This paper introduces a novel camera calibration method using a designed collimator system. Our collimator system provides a reliable and controllable calibration environment for the camera. Exploiting the unique optical geometry property of our collimator system, we introduce an angle invariance constraint and further prove that the relative motion between the calibration target and camera conforms to a spherical motion model. This constraint reduces the original 6DOF relative motion between target and camera to a 3DOF pure rotation motion. Using spherical motion constraint, a closed-form linear solver for multiple images and a minimal solver for two images are proposed for camera calibration. Furthermore, we propose a single collimator image calibration algorithm based on the angle invariance constraint. This algorithm eliminates the requirement for camera motion, providing a novel solution for flexible and fast calibration. The performance of our method is evaluated in both synthetic and real-world experiments, which verify the feasibility of calibration using the collimator system and demonstrate that our method is superior to existing baseline methods. Demo code is available at https://github.com/LiangSK98/CollimatorCalibration

CVDec 24, 2025
Optical Flow-Guided 6DoF Object Pose Tracking with an Event Camera

Zibin Liu, Banglei Guan, Yang Shang et al.

Object pose tracking is one of the pivotal technologies in multimedia, attracting ever-growing attention in recent years. Existing methods employing traditional cameras encounter numerous challenges such as motion blur, sensor noise, partial occlusion, and changing lighting conditions. The emerging bio-inspired sensors, particularly event cameras, possess advantages such as high dynamic range and low latency, which hold the potential to address the aforementioned challenges. In this work, we present an optical flow-guided 6DoF object pose tracking method with an event camera. A 2D-3D hybrid feature extraction strategy is firstly utilized to detect corners and edges from events and object models, which characterizes object motion precisely. Then, we search for the optical flow of corners by maximizing the event-associated probability within a spatio-temporal window, and establish the correlation between corners and edges guided by optical flow. Furthermore, by minimizing the distances between corners and edges, the 6DoF object pose is iteratively optimized to achieve continuous pose tracking. Experimental results of both simulated and real events demonstrate that our methods outperform event-based state-of-the-art methods in terms of both accuracy and robustness.

CVDec 18, 2025
Collimator-assisted high-precision calibration method for event cameras

Zibin Liu, Shunkun Liang, Banglei Guan et al.

Event cameras are a new type of brain-inspired visual sensor with advantages such as high dynamic range and high temporal resolution. The geometric calibration of event cameras, which involves determining their intrinsic and extrinsic parameters, particularly in long-range measurement scenarios, remains a significant challenge. To address the dual requirements of long-distance and high-precision measurement, we propose an event camera calibration method utilizing a collimator with flickering star-based patterns. The proposed method first linearly solves camera parameters using the sphere motion model of the collimator, followed by nonlinear optimization to refine these parameters with high precision. Through comprehensive real-world experiments across varying conditions, we demonstrate that the proposed method consistently outperforms existing event camera calibration methods in terms of accuracy and reliability.

CVDec 19, 2025
Globally Optimal Solution to the Generalized Relative Pose Estimation Problem using Affine Correspondences

Zhenbao Yu, Banglei Guan, Shunkun Liang et al.

Mobile devices equipped with a multi-camera system and an inertial measurement unit (IMU) are widely used nowadays, such as self-driving cars. The task of relative pose estimation using visual and inertial information has important applications in various fields. To improve the accuracy of relative pose estimation of multi-camera systems, we propose a globally optimal solver using affine correspondences to estimate the generalized relative pose with a known vertical direction. First, a cost function about the relative rotation angle is established after decoupling the rotation matrix and translation vector, which minimizes the algebraic error of geometric constraints from affine correspondences. Then, the global optimization problem is converted into two polynomials with two unknowns based on the characteristic equation and its first derivative is zero. Finally, the relative rotation angle can be solved using the polynomial eigenvalue solver, and the translation vector can be obtained from the eigenvector. Besides, a new linear solution is proposed when the relative rotation is small. The proposed solver is evaluated on synthetic data and real-world datasets. The experiment results demonstrate that our method outperforms comparable state-of-the-art methods in accuracy.

CVDec 18, 2025
Ridge Estimation-Based Vision and Laser Ranging Fusion Localization Method for UAVs

Huayu Huang, Chen Chen, Banglei Guan et al.

Tracking and measuring targets using a variety of sensors mounted on UAVs is an effective means to quickly and accurately locate the target. This paper proposes a fusion localization method based on ridge estimation, combining the advantages of rich scene information from sequential imagery with the high precision of laser ranging to enhance localization accuracy. Under limited conditions such as long distances, small intersection angles, and large inclination angles, the column vectors of the design matrix have serious multicollinearity when using the least squares estimation algorithm. The multicollinearity will lead to ill-conditioned problems, resulting in significant instability and low robustness. Ridge estimation is introduced to mitigate the serious multicollinearity under the condition of limited observation. Experimental results demonstrate that our method achieves higher localization accuracy compared to ground localization algorithms based on single information. Moreover, the introduction of ridge estimation effectively enhances the robustness, particularly under limited observation conditions.

CVMar 17, 2025Code
Stereo Event-based, 6-DOF Pose Tracking for Uncooperative Spacecraft

Zibin Liu, Banglei Guan, Yang Shang et al.

Pose tracking of uncooperative spacecraft is an essential technology for space exploration and on-orbit servicing, which remains an open problem. Event cameras possess numerous advantages, such as high dynamic range, high temporal resolution, and low power consumption. These attributes hold the promise of overcoming challenges encountered by conventional cameras, including motion blur and extreme illumination, among others. To address the standard on-orbit observation missions, we propose a line-based pose tracking method for uncooperative spacecraft utilizing a stereo event camera. To begin with, we estimate the wireframe model of uncooperative spacecraft, leveraging the spatio-temporal consistency of stereo event streams for line-based reconstruction. Then, we develop an effective strategy to establish correspondences between events and projected lines of uncooperative spacecraft. Using these correspondences, we formulate the pose tracking as a continuous optimization process over 6-DOF motion parameters, achieved by minimizing event-line distances. Moreover, we construct a stereo event-based uncooperative spacecraft motion dataset, encompassing both simulated and real events. The proposed method is quantitatively evaluated through experiments conducted on our self-collected dataset, demonstrating an improvement in terms of effectiveness and accuracy over competing methods. The code will be open-sourced at https://github.com/Zibin6/SE6PT.

CVDec 9, 2025
Simultaneous Enhancement and Noise Suppression under Complex Illumination Conditions

Jing Tao, You Li, Banglei Guan et al.

Under challenging light conditions, captured images often suffer from various degradations, leading to a decline in the performance of vision-based applications. Although numerous methods have been proposed to enhance image quality, they either significantly amplify inherent noise or are only effective under specific illumination conditions. To address these issues, we propose a novel framework for simultaneous enhancement and noise suppression under complex illumination conditions. Firstly, a gradient-domain weighted guided filter (GDWGIF) is employed to accurately estimate illumination and improve image quality. Next, the Retinex model is applied to decompose the captured image into separate illumination and reflection layers. These layers undergo parallel processing, with the illumination layer being corrected to optimize lighting conditions and the reflection layer enhanced to improve image quality. Finally, the dynamic range of the image is optimized through multi-exposure fusion and a linear stretching strategy. The proposed method is evaluated on real-world datasets obtained from practical applications. Experimental results demonstrate that our proposed method achieves better performance compared to state-of-the-art methods in both contrast enhancement and noise suppression.

CVFeb 12
A DMD-Based Adaptive Modulation Method for High Dynamic Range Imaging in High-Glare Environments

Banglei Guan, Jing Tao, Liang Xu et al.

Background The accuracy of photomechanics measurements critically relies on image quality,particularly under extreme illumination conditions such as welding arc monitoring and polished metallic surface analysis. High dynamic range (HDR) imaging above 120 dB is essential in these contexts. Conventional CCD/CMOS sensors, with dynamic ranges typically below 70 dB, are highly susceptible to saturation under glare, resulting in irreversible loss of detail and significant errors in digital image correlation (DIC). Methods This paper presents an HDR imaging system that leverages the spatial modulation capability of a digital micromirror device (DMD). The system architecture enables autonomous regional segmentation and adaptive exposure control for high-dynamic-range scenes through an integrated framework comprising two synergistic subsystems: a DMD-based optical modulation unit and an adaptive computational imaging pipeline. Results The system achieves a measurable dynamic range of 127 dB, effectively eliminating satu ration artifacts under high glare. Experimental results demonstrate a 78% reduction in strain error and improved DIC positioning accuracy, confirming reliable performance across extreme intensity variations. Conclusion The DMD-based system provides high fidelity adaptive HDR imaging, overcoming key limitations of conventional sensors. It exhibits strong potential for optical metrology and stress analysis in high-glare environments where traditional methods are inadequate.

CVMar 26
Synergistic Event-SVE Imaging for Quantitative Propellant Combustion Diagnostics

Jing Tao, Taihang Lei, Banglei Guan et al.

Real-time monitoring of high-energy propellant combustion is difficult. Extreme high dynamic range (HDR), microsecond-scale particle motion, and heavy smoke often occur together. These conditions drive saturation, motion blur, and unstable particle extraction in conventional imaging. We present a closed-loop Event--SVE measurement system that couples a spatially variant exposure (SVE) camera with a stereo pair of neuromorphic event cameras. The SVE branch produces HDR maps with an explicit smoke-aware fusion strategy. A multi-cue smoke-likelihood map is used to separate particle emission from smoke scattering, yielding calibrated intensity maps for downstream analysis. The resulting HDR maps also provide the absolute-intensity reference missing in event cameras. This reference is used to suppress smoke-driven event artifacts and to improve particle-state discrimination. Based on the cleaned event observations, a stereo event-based 3D pipeline estimates separation height and equivalent particle size through feature extraction and triangulation (maximum calibration error 0.56%). Experiments on boron-based propellants show multimodal equivalent-radius statistics. The system also captures fast separation transients that are difficult to observe with conventional sensors. Overall, the proposed framework provides a practical, calibration-consistent route to microsecond-resolved 3D combustion measurement under smoke-obscured HDR conditions.

CVDec 27, 2025
Event-based high temporal resolution measurement of shock wave motion field

Taihang Lei, Banglei Guan, Minzu Liang et al.

Accurate measurement of shock wave motion parameters with high spatiotemporal resolution is essential for applications such as power field testing and damage assessment. However, significant challenges are posed by the fast, uneven propagation of shock waves and unstable testing conditions. To address these challenges, a novel framework is proposed that utilizes multiple event cameras to estimate the asymmetry of shock waves, leveraging its high-speed and high-dynamic range capabilities. Initially, a polar coordinate system is established, which encodes events to reveal shock wave propagation patterns, with adaptive region-of-interest (ROI) extraction through event offset calculations. Subsequently, shock wave front events are extracted using iterative slope analysis, exploiting the continuity of velocity changes. Finally, the geometric model of events and shock wave motion parameters is derived according to event-based optical imaging model, along with the 3D reconstruction model. Through the above process, multi-angle shock wave measurement, motion field reconstruction, and explosive equivalence inversion are achieved. The results of the speed measurement are compared with those of the pressure sensors and the empirical formula, revealing a maximum error of 5.20% and a minimum error of 0.06%. The experimental results demonstrate that our method achieves high-precision measurement of the shock wave motion field with both high spatial and temporal resolution, representing significant progress.

CVJan 13
A Hardware-Algorithm Co-Designed Framework for HDR Imaging and Dehazing in Extreme Rocket Launch Environments

Jing Tao, Banglei Guan, Pengju Sun et al.

Quantitative optical measurement of critical mechanical parameters -- such as plume flow fields, shock wave structures, and nozzle oscillations -- during rocket launch faces severe challenges due to extreme imaging conditions. Intense combustion creates dense particulate haze and luminance variations exceeding 120 dB, degrading image data and undermining subsequent photogrammetric and velocimetric analyses. To address these issues, we propose a hardware-algorithm co-design framework that combines a custom Spatially Varying Exposure (SVE) sensor with a physics-aware dehazing algorithm. The SVE sensor acquires multi-exposure data in a single shot, enabling robust haze assessment without relying on idealized atmospheric models. Our approach dynamically estimates haze density, performs region-adaptive illumination optimization, and applies multi-scale entropy-constrained fusion to effectively separate haze from scene radiance. Validated on real launch imagery and controlled experiments, the framework demonstrates superior performance in recovering physically accurate visual information of the plume and engine region. This offers a reliable image basis for extracting key mechanical parameters, including particle velocity, flow instability frequency, and structural vibration, thereby supporting precise quantitative analysis in extreme aerospace environments.

ROJan 13
Robust Subpixel Localization of Diagonal Markers in Large-Scale Navigation via Multi-Layer Screening and Adaptive Matching

Jing Tao, Banglei Guan, Yang Shang et al.

This paper proposes a robust, high-precision positioning methodology to address localization failures arising from complex background interference in large-scale flight navigation and the computational inefficiency inherent in conventional sliding window matching techniques. The proposed methodology employs a three-tiered framework incorporating multi-layer corner screening and adaptive template matching. Firstly, dimensionality is reduced through illumination equalization and structural information extraction. A coarse-to-fine candidate selection strategy minimizes sliding window computational costs, enabling rapid estimation of the marker's position. Finally, adaptive templates are generated for candidate points, achieving subpixel precision through improved template matching with correlation coefficient extremum fitting. Experimental results demonstrate the method's effectiveness in extracting and localizing diagonal markers in complex, large-scale environments, making it ideal for field-of-view measurement in navigation tasks.

CVApr 7, 2025Code
Learning Affine Correspondences by Integrating Geometric Constraints

Pengju Sun, Banglei Guan, Zhenbao Yu et al.

Affine correspondences have received significant attention due to their benefits in tasks like image matching and pose estimation. Existing methods for extracting affine correspondences still have many limitations in terms of performance; thus, exploring a new paradigm is crucial. In this paper, we present a new pipeline designed for extracting accurate affine correspondences by integrating dense matching and geometric constraints. Specifically, a novel extraction framework is introduced, with the aid of dense matching and a novel keypoint scale and orientation estimator. For this purpose, we propose loss functions based on geometric constraints, which can effectively improve accuracy by supervising neural networks to learn feature geometry. The experimental show that the accuracy and robustness of our method outperform the existing ones in image matching tasks. To further demonstrate the effectiveness of the proposed method, we applied it to relative pose estimation. Affine correspondences extracted by our method lead to more accurate poses than the baselines on a range of real-world datasets. The code is available at https://github.com/stilcrad/DenseAffine.

CVMar 30
Event-Based Method for High-Speed 3D Deformation Measurement under Extreme Illumination Conditions

Banglei Guan, Yifei Bian, Zibin Liu et al.

Background: Large engineering structures, such as space launch towers and suspension bridges, are subjected to extreme forces that cause high-speed 3D deformation and compromise safety. These structures typically operate under extreme illumination conditions. Traditional cameras often struggle to handle strong light intensity, leading to overexposure due to their limited dynamic range. Objective: Event cameras have emerged as a compelling alternative to traditional cameras in high dynamic range and low-latency applications. This paper presents an integrated method, from calibration to measurement, using a multi-event camera array for high-speed 3D deformation monitoring of structures in extreme illumination conditions. Methods: Firstly, the proposed method combines the characteristics of the asynchronous event stream and temporal correlation analysis to extract the corresponding marker center point. Subsequently, the method achieves rapid calibration by solving the Kruppa equations in conjunction with a parameter optimization framework. Finally, by employing a unified coordinate transformation and linear intersection, the method enables the measurement of 3D deformation of the target structure. Results: Experiments confirmed that the relative measurement error is below 0.08%. Field experiments under extreme illumination conditions, including self-calibration of a multi-event camera array and 3D deformation measurement, verified the performance of the proposed method. Conclusions: This paper addressed the critical limitation of traditional cameras in measuring high-speed 3D deformations under extreme illumination conditions. The experimental results demonstrate that, compared to other methods, the proposed method can accurately measure 3D deformations of structures under harsh lighting conditions, and the relative error of the measured deformation is less than 0.1%.

CVMar 12, 2025Code
Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark

Yibin Ye, Xichao Teng, Shuo Chen et al.

Absolute Visual Localization (AVL) enables Unmanned Aerial Vehicle (UAV) to determine its position in GNSS-denied environments by establishing geometric relationships between UAV images and geo-tagged reference maps. While many previous works have achieved AVL with image retrieval and matching techniques, research in low-altitude multi-view scenarios still remains limited. Low-altitude Multi-view condition presents greater challenges due to extreme viewpoint changes. To explore the best UAV AVL approach in such condition, we proposed this benchmark. Firstly, a large-scale Low-altitude Multi-view dataset called AnyVisLoc was constructed. This dataset includes 18,000 images captured at multiple scenes and altitudes, along with 2.5D reference maps containing aerial photogrammetry maps and historical satellite maps. Secondly, a unified framework was proposed to integrate the state-of-the-art AVL approaches and comprehensively test their performance. The best combined method was chosen as the baseline and the key factors that influencing localization accuracy are thoroughly analyzed based on it. This baseline achieved a 74.1% localization accuracy within 5m under Low-altitude, Multi-view conditions. In addition, a novel retrieval metric called PDM@K was introduced to better align with the characteristics of the UAV AVL task. Overall, this benchmark revealed the challenges of Low-altitude, Multi-view UAV AVL and provided valuable guidance for future research. The dataset and codes are available at https://github.com/UAV-AVL/Benchmark

CVApr 26
A Pose-only Geometric Constraint for Multi-Camera Pose Adjustment

Shunkun Liang, Banglei Guan, Bin Li et al.

Multi-camera systems offer rich observation capabilities for visual navigation and 3D scene reconstruction; however, the resulting feature redundancy often compromises computational efficiency. This challenge is particularly pronounced during bundle adjustment, where the non-linear optimization of both system poses and scene points incurs substantial computational overhead. To address this challenge, this paper introduces a pose-only geometric constraint for multi-camera systems and proposes a corresponding pose adjustment algorithm. Specifically, we use generalized camera model to establish a unified representation of the multi-camera system. Building upon this model, we formulate the multi-camera pose-only constraint, which implicitly represents a 3D scene point using two base observations and their associated poses, thereby achieving a pose-only representation of the projection geometry. Subsequently, we introduce a multi-camera pose adjustment algorithm that eliminates 3D points from the parameter space, thereby achieving efficient and focused pose optimization. Experimental results on both synthetic and real-world datasets demonstrate that the proposed algorithm outperforms baseline bundle adjustment methods in computational efficiency, while maintaining or even improving pose estimation accuracy.

CVMay 31, 2025
Event-based multi-view photogrammetry for high-dynamic, high-velocity target measurement

Taihang Lei, Banglei Guan, Minzu Liang et al.

The characterization of mechanical properties for high-dynamic, high-velocity target motion is essential in industries. It provides crucial data for validating weapon systems and precision manufacturing processes etc. However, existing measurement methods face challenges such as limited dynamic range, discontinuous observations, and high costs. This paper presents a new approach leveraging an event-based multi-view photogrammetric system, which aims to address the aforementioned challenges. First, the monotonicity in the spatiotemporal distribution of events is leveraged to extract the target's leading-edge features, eliminating the tailing effect that complicates motion measurements. Then, reprojection error is used to associate events with the target's trajectory, providing more data than traditional intersection methods. Finally, a target velocity decay model is employed to fit the data, enabling accurate motion measurements via ours multi-view data joint computation. In a light gas gun fragment test, the proposed method showed a measurement deviation of 4.47% compared to the electromagnetic speedometer.

CVFeb 25, 2025
High-precision visual navigation device calibration method based on collimator

Shunkun Liang, Dongcai Tan, Banglei Guan et al.

Visual navigation devices require precise calibration to achieve high-precision localization and navigation, which includes camera and attitude calibration. To address the limitations of time-consuming camera calibration and complex attitude adjustment processes, this study presents a collimator-based calibration method and system. Based on the optical characteristics of the collimator, a single-image camera calibration algorithm is introduced. In addition, integrated with the precision adjustment mechanism of the calibration frame, a rotation transfer model between coordinate systems enables efficient attitude calibration. Experimental results demonstrate that the proposed method achieves accuracy and stability comparable to traditional multi-image calibration techniques. Specifically, the re-projection errors are less than 0.1463 pixels, and average attitude angle errors are less than 0.0586 degrees with a standard deviation less than 0.0257 degrees, demonstrating high precision and robustness.

CVJan 19
Fusion-Restoration Image Processing Algorithm to Improve the High-Temperature Deformation Measurement

Banglei Guan, Dongcai Tan, Jing Tao et al.

In the deformation measurement of high-temperature structures, image degradation caused by thermal radiation and random errors introduced by heat haze restrict the accuracy and effectiveness of deformation measurement. To suppress thermal radiation and heat haze using fusion-restoration image processing methods, thereby improving the accuracy and effectiveness of DIC in the measurement of high-temperature deformation. For image degradation caused by thermal radiation, based on the image layered representation, the image is decomposed into positive and negative channels for parallel processing, and then optimized for quality by multi-exposure image fusion. To counteract the high-frequency, random errors introduced by heat haze, we adopt the FSIM as the objective function to guide the iterative optimization of model parameters, and the grayscale average algorithm is applied to equalize anomalous gray values, thereby reducing measurement error. The proposed multi-exposure image fusion algorithm effectively suppresses image degradation caused by complex illumination conditions, boosting the effective computation area from 26% to 50% for under-exposed images and from 32% to 40% for over-exposed images without degrading measurement accuracy in the experiment. Meanwhile, the image restoration combined with the grayscale average algorithm reduces static thermal deformation measurement errors. The error in ε_xx is reduced by 85.3%, while the errors in ε_yy and γ_xy are reduced by 36.0% and 36.4%, respectively. We present image processing methods to suppress the interference of thermal radiation and heat haze in high-temperature deformation measurement using DIC. The experimental results verify that the proposed method can effectively improve image quality, reduce deformation measurement errors, and has potential application value in thermal deformation measurement.

IVJun 28, 2025
Prompt Mechanisms in Medical Imaging: A Comprehensive Survey

Hao Yang, Xinlong Liang, Zhang Li et al.

Deep learning offers transformative potential in medical imaging, yet its clinical adoption is frequently hampered by challenges such as data scarcity, distribution shifts, and the need for robust task generalization. Prompt-based methodologies have emerged as a pivotal strategy to guide deep learning models, providing flexible, domain-specific adaptations that significantly enhance model performance and adaptability without extensive retraining. This systematic review critically examines the burgeoning landscape of prompt engineering in medical imaging. We dissect diverse prompt modalities, including textual instructions, visual prompts, and learnable embeddings, and analyze their integration for core tasks such as image generation, segmentation, and classification. Our synthesis reveals how these mechanisms improve task-specific outcomes by enhancing accuracy, robustness, and data efficiency and reducing reliance on manual feature engineering while fostering greater model interpretability by making the model's guidance explicit. Despite substantial advancements, we identify persistent challenges, particularly in prompt design optimization, data heterogeneity, and ensuring scalability for clinical deployment. Finally, this review outlines promising future trajectories, including advanced multimodal prompting and robust clinical integration, underscoring the critical role of prompt-driven AI in accelerating the revolution of diagnostics and personalized treatment planning in medicine.

CVFeb 27, 2025
Accurate Pose Estimation for Flight Platforms based on Divergent Multi-Aperture Imaging System

Shunkun Liang, Bin Li, Banglei Guan et al.

Vision-based pose estimation plays a crucial role in the autonomous navigation of flight platforms. However, the field of view and spatial resolution of the camera limit pose estimation accuracy. This paper designs a divergent multi-aperture imaging system (DMAIS), equivalent to a single imaging system to achieve simultaneous observation of a large field of view and high spatial resolution. The DMAIS overcomes traditional observation limitations, allowing accurate pose estimation for the flight platform. {Before conducting pose estimation, the DMAIS must be calibrated. To this end we propose a calibration method for DMAIS based on the 3D calibration field.} The calibration process determines the imaging parameters of the DMAIS, which allows us to model DMAIS as a generalized camera. Subsequently, a new algorithm for accurately determining the pose of flight platform is introduced. We transform the absolute pose estimation problem into a nonlinear minimization problem. New optimality conditions are established for solving this problem based on Lagrange multipliers. Finally, real calibration experiments show the effectiveness and accuracy of the proposed method. Results from real flight experiments validate the system's ability to achieve centimeter-level positioning accuracy and arc-minute-level orientation accuracy.

CVFeb 27, 2025
3D Trajectory Reconstruction of Moving Points Based on a Monocular Camera

Huayu Huang, Banglei Guan, Yang Shang et al.

The motion measurement of point targets constitutes a fundamental problem in photogrammetry, with extensive applications across various engineering domains. Reconstructing a point's 3D motion just from the images captured by only a monocular camera is unfeasible without prior assumptions. Under limited observation conditions such as insufficient observations, long distance, and high observation error of platform, the least squares estimation faces the issue of ill-conditioning. This paper presents an algorithm for reconstructing 3D trajectories of moving points using a monocular camera. The motion of the points is represented through temporal polynomials. Ridge estimation is introduced to mitigate the issues of ill-conditioning caused by limited observation conditions. Then, an automatic algorithm for determining the order of the temporal polynomials is proposed. Furthermore, the definition of reconstructability for temporal polynomials is proposed to describe the reconstruction accuracy quantitatively. The simulated and real-world experimental results demonstrate the feasibility, accuracy, and efficiency of the proposed method.

CVDec 17, 2025
Asynchronous Event Stream Noise Filtering for High-frequency Structure Deformation Measurement

Yifei Bian, Banglei Guan, Zibin Liu et al.

Large-scale structures suffer high-frequency deformations due to complex loads. However, harsh lighting conditions and high equipment costs limit measurement methods based on traditional high-speed cameras. This paper proposes a method to measure high-frequency deformations by exploiting an event camera and LED markers. Firstly, observation noise is filtered based on the characteristics of the event stream generated by LED markers blinking and spatiotemporal correlation. Then, LED markers are extracted from the event stream after differentiating between motion-induced events and events from LED blinking, which enables the extraction of high-speed moving LED markers. Ultimately, high-frequency planar deformations are measured by a monocular event camera. Experimental results confirm the accuracy of our method in measuring high-frequency planar deformations.

CVMar 8
Scale-Aware UAV-to-Satellite Cross-View Geo-Localization: A Semantic Geometric Approach

Yibin Ye, Shuo Chen, Kun Wang et al.

Cross-View Geo-Localization (CVGL) between UAV imagery and satellite images plays a crucial role in target localization and UAV self-positioning. However, most existing methods rely on the idealized assumption of scale consistency between UAV queries and satellite galleries, overlooking the severe scale ambiguity commonly encountered in real-world scenarios. This discrepancy leads to field-of-view misalignment and feature mismatch, significantly degrading CVGL robustness. To address this issue, we propose a geometric framework that recovers the absolute metric scale from monocular UAV images using semantic anchors. Specifically, small vehicles (SVs), characterized by relatively stable prior size distributions and high detectability, are exploited as metric references. A Decoupled Stereoscopic Projection Model is introduced to estimate the absolute image scale from these semantic targets. By decomposing vehicle dimensions into radial and tangential components, the model compensates for perspective distortions in 2D detections of 3D vehicles, enabling more accurate scale estimation. To further reduce intra-class size variation and detection noise, a dual-dimension fusion strategy with Interquartile Range (IQR)-based robust aggregation is employed. The estimated global scale is then used as a physical constraint for scale-adaptive satellite image cropping, improving UAV-to-satellite feature alignment. Experiments on augmented DenseUAV and UAV-VisLoc datasets demonstrate that the proposed method significantly improves CVGL robustness under unknown UAV image scales. Additionally, the framework shows strong potential for downstream applications such as passive UAV altitude estimation and 3D model scale recovery.

CVNov 23, 2025
Optimal Pose Guidance for Stereo Calibration in 3D Deformation Measurement

Dongcai Tan, Shunkun Liang, Bin Li et al.

Stereo optical measurement techniques, such as digital image correlation (DIC), are widely used in 3D deformation measurement as non-contact, full-field measurement methods, in which stereo calibration is a crucial step. However, current stereo calibration methods lack intuitive optimal pose guidance, leading to inefficiency and suboptimal accuracy in deformation measurements. The aim of this study is to develop an interactive calibration framework that automatically generates the next optimal pose, enabling high-accuracy stereo calibration for 3D deformation measurement. We propose a pose optimization method that introduces joint optimization of relative and absolute extrinsic parameters, with the minimization of the covariance matrix trace adopted as the loss function to solve for the next optimal pose. Integrated with this method is a user-friendly graphical interface, which guides even non-expert users to capture qualified calibration images. Our proposed method demonstrates superior efficiency (requiring fewer images) and accuracy (demonstrating lower measurement errors) compared to random pose, while maintaining robustness across varying FOVs. In the thermal deformation measurement tests on an S-shaped specimen, the results exhibit high agreement with finite element analysis (FEA) simulations in both deformation magnitude and evolutionary trends. We present a pose guidance method for high-precision stereo calibration in 3D deformation measurement. The simulation experiments, real-world experiments, and thermal deformation measurement applications all demonstrate the significant application potential of our proposed method in the field of 3D deformation measurement. Keywords: Stereo calibration, Optimal pose guidance, 3D deformation measurement, Digital image correlation

CVOct 13, 2025
DKPMV: Dense Keypoints Fusion from Multi-View RGB Frames for 6D Pose Estimation of Textureless Objects

Jiahong Chen, Jinghao Wang, Zi Wang et al.

6D pose estimation of textureless objects is valuable for industrial robotic applications, yet remains challenging due to the frequent loss of depth information. Current multi-view methods either rely on depth data or insufficiently exploit multi-view geometric cues, limiting their performance. In this paper, we propose DKPMV, a pipeline that achieves dense keypoint-level fusion using only multi-view RGB images as input. We design a three-stage progressive pose optimization strategy that leverages dense multi-view keypoint geometry information. To enable effective dense keypoint fusion, we enhance the keypoint network with attentional aggregation and symmetry-aware training, improving prediction accuracy and resolving ambiguities on symmetric objects. Extensive experiments on the ROBI dataset demonstrate that DKPMV outperforms state-of-the-art multi-view RGB approaches and even surpasses the RGB-D methods in the majority of cases. The code will be available soon.

CVJun 28, 2025
Deterministic Object Pose Confidence Region Estimation

Jinghao Wang, Zhang Li, Zi Wang et al.

6D pose confidence region estimation has emerged as a critical direction, aiming to perform uncertainty quantification for assessing the reliability of estimated poses. However, current sampling-based approach suffers from critical limitations that severely impede their practical deployment: 1) the sampling speed significantly decreases as the number of samples increases. 2) the derived confidence regions are often excessively large. To address these challenges, we propose a deterministic and efficient method for estimating pose confidence regions. Our approach uses inductive conformal prediction to calibrate the deterministically regressed Gaussian keypoint distributions into 2D keypoint confidence regions. We then leverage the implicit function theorem to propagate these keypoint confidence regions directly into 6D pose confidence regions. This method avoids the inefficiency and inflated region sizes associated with sampling and ensembling. It provides compact confidence regions that cover the ground-truth poses with a user-defined confidence level. Experimental results on the LineMOD Occlusion and SPEED datasets show that our method achieves higher pose estimation accuracy with reduced computational time. For the same coverage rate, our method yields significantly smaller confidence region volumes, reducing them by up to 99.9\% for rotations and 99.8\% for translations. The code will be available soon.

CVMay 31, 2025
3D Trajectory Reconstruction of Moving Points Based on Asynchronous Cameras

Huayu Huang, Banglei Guan, Yang Shang et al.

Photomechanics is a crucial branch of solid mechanics. The localization of point targets constitutes a fundamental problem in optical experimental mechanics, with extensive applications in various missions of UAVs. Localizing moving targets is crucial for analyzing their motion characteristics and dynamic properties. Reconstructing the trajectories of points from asynchronous cameras is a significant challenge. It encompasses two coupled sub-problems: trajectory reconstruction and camera synchronization. Present methods typically address only one of these sub-problems individually. This paper proposes a 3D trajectory reconstruction method for point targets based on asynchronous cameras, simultaneously solving both sub-problems. Firstly, we extend the trajectory intersection method to asynchronous cameras to resolve the limitation of traditional triangulation that requires camera synchronization. Secondly, we develop models for camera temporal information and target motion, based on imaging mechanisms and target dynamics characteristics. The parameters are optimized simultaneously to achieve trajectory reconstruction without accurate time parameters. Thirdly, we optimize the camera rotations alongside the camera time information and target motion parameters, using tighter and more continuous constraints on moving points. The reconstruction accuracy is significantly improved, especially when the camera rotations are inaccurate. Finally, the simulated and real-world experimental results demonstrate the feasibility and accuracy of the proposed method. The real-world results indicate that the proposed algorithm achieved a localization error of 112.95 m at an observation range of 15 ~ 20 km.

IVAug 21, 2020
Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019

Zhang Li, Jiehua Zhang, Tao Tan et al.

Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using the false positive rate, false negative rate, and DICE coefficient (DC). The DC ranged from 0.7354$\pm$0.1149 to 0.8372$\pm$0.0858. The DC of the best method was close to the inter-observer agreement (0.8398$\pm$0.0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ($\textit{p}$<$0.01$) than single model methods, with mean DC of 0.7966 and 0.7544, respectively. Deep learning based methods could potentially help pathologists find suspicious regions for further analysis of lung cancer in WSI.