Zhuo Zou

h-index24

7papers

217citations

Novelty42%

AI Score40

Ranked #96,985 of 205,806 authors (top 47%)#31,962 in CV (top 54%)

7 Papers

53.5ROMay 27Code

SAFEVPR: Patch-Based Conformal Verification for Safe Cross-Condition Sequence Visual Place Recognition

Ha Sier, Jiaqiang Zhang, Zhuo Zou et al.

Sequence-based visual place recognition (VPR) for SLAM and robot relocalization must decide whether the retrieved top-1 candidate is safe to accept. Conformal prediction is a natural framework for this accept/reject decision, but its finite-sample guarantees rely on exchangeability between calibration and deployment (test) data, which is violated under cross-condition deployment. We introduce SAFEVPR, a non-trainable verification-and-calibration pipeline for safe cross-condition sequence VPR. SAFEVPR replaces the standard backbone cosine similarity with a mutual-nearest-neighbour (MNN) patch-matching score computed from frozen DINOv2 ViT features, and replaces flat Learn-Then-Test calibration with Mondrian conformal LTT, fitting separate Bonferroni-corrected thresholds across score bins. Under exchangeability, these thresholds would provide finite-sample false-discovery-rate (FDR) control; under condition shift, we evaluate empirical validity per deployment. Across 23 cross-condition setups from Oxford RobotCar, NCLT, and St Lucia datasets, using three frozen VPR backbones, SAFEVPR is empirically valid on 23/23 setups at target FDR alpha = 0.10, achieving mean accepted FDR 0.014 and mean true-positive rate (TPR) 0.75. The results show that raw discrimination alone is not sufficient for conformal validity: AnyLoc-VLAD and Super-Point+LightGlue reach comparable area under the receiver operating characteristic curve (AUROC) but fail more setups under the same calibration. On textureless repetitive scenery, SAFEVPR safely abstains rather than accepting unreliable matches. Code is available at https://github.com/Hasar12139/SafeVPR.

CVApr 25, 2025

Event-Based Eye Tracking. 2025 Event-based Vision Workshop

Qinyu Chen, Chang Gao, Min Liu et al.

This survey serves as a review for the 2025 Event-Based Eye Tracking Challenge organized as part of the 2025 CVPR event-based vision workshop. This challenge focuses on the task of predicting the pupil center by processing event camera recorded eye movement. We review and summarize the innovative methods from teams rank the top in the challenge to advance future event-based eye tracking research. In each method, accuracy, model size, and number of operations are reported. In this survey, we also discuss event-based eye tracking from the perspective of hardware design.

CVMay 13, 2025

EventDiff: A Unified and Efficient Diffusion Model Framework for Event-based Video Frame Interpolation

Hanle Zheng, Xujie Han, Zegang Peng et al.

Video Frame Interpolation (VFI) is a fundamental yet challenging task in computer vision, particularly under conditions involving large motion, occlusion, and lighting variation. Recent advancements in event cameras have opened up new opportunities for addressing these challenges. While existing event-based VFI methods have succeeded in recovering large and complex motions by leveraging handcrafted intermediate representations such as optical flow, these designs often compromise high-fidelity image reconstruction under subtle motion scenarios due to their reliance on explicit motion modeling. Meanwhile, diffusion models provide a promising alternative for VFI by reconstructing frames through a denoising process, eliminating the need for explicit motion estimation or warping operations. In this work, we propose EventDiff, a unified and efficient event-based diffusion model framework for VFI. EventDiff features a novel Event-Frame Hybrid AutoEncoder (HAE) equipped with a lightweight Spatial-Temporal Cross Attention (STCA) module that effectively fuses dynamic event streams with static frames. Unlike previous event-based VFI methods, EventDiff performs interpolation directly in the latent space via a denoising diffusion process, making it more robust across diverse and challenging VFI scenarios. Through a two-stage training strategy that first pretrains the HAE and then jointly optimizes it with the diffusion model, our method achieves state-of-the-art performance across multiple synthetic and real-world event VFI datasets. The proposed method outperforms existing state-of-the-art event-based VFI methods by up to 1.98dB in PSNR on Vimeo90K-Triplet and shows superior performance in SNU-FILM tasks with multiple difficulty levels. Compared to the emerging diffusion-based VFI approach, our method achieves up to 5.72dB PSNR gain on Vimeo90K-Triplet and 4.24X faster inference.

CRMay 31, 2025

Blockchain Powered Edge Intelligence for U-Healthcare in Privacy Critical and Time Sensitive Environment

Anum Nawaz, Hafiz Humza Mahmood Ramzan, Xianjia Yu et al.

Edge Intelligence (EI) serves as a critical enabler for privacy-preserving systems by providing AI-empowered computation and distributed caching services at the edge, thereby minimizing latency and enhancing data privacy. The integration of blockchain technology further augments EI frameworks by ensuring transactional transparency, auditability, and system-wide reliability through a decentralized network model. However, the operational architecture of such systems introduces inherent vulnerabilities, particularly due to the extensive data interactions between edge gateways (EGs) and the distributed nature of information storage during service provisioning. To address these challenges, we propose an autonomous computing model along with its interaction topologies tailored for privacy-critical and time-sensitive health applications. The system supports continuous monitoring, real-time alert notifications, disease detection, and robust data processing and aggregation. It also includes a data transaction handler and mechanisms for ensuring privacy at the EGs. Moreover, a resource-efficient one-dimensional convolutional neural network (1D-CNN) is proposed for the multiclass classification of arrhythmia, enabling accurate and real-time analysis of constrained EGs. Furthermore, a secure access scheme is defined to manage both off-chain and on-chain data sharing and storage. To validate the proposed model, comprehensive security, performance, and cost analyses are conducted, demonstrating the efficiency and reliability of the fine-grained access control scheme.

LGMay 31, 2025

Blockchain-Enabled Privacy-Preserving Second-Order Federated Edge Learning in Personalized Healthcare

Anum Nawaz, Muhammad Irfan, Xianjia Yu et al.

Federated learning (FL) has attracted increasing attention to mitigate security and privacy challenges in traditional cloud-centric machine learning models specifically in healthcare ecosystems. FL methodologies enable the training of global models through localized policies, allowing independent operations at the edge clients' level. Conventional first-order FL approaches face several challenges in personalized model training due to heterogeneous non-independent and identically distributed (non-iid) data of each edge client. Recently, second-order FL approaches maintain the stability and consistency of non-iid datasets while improving personalized model training. This study proposes and develops a verifiable and auditable optimized second-order FL framework BFEL (blockchain-enhanced federated edge learning) based on optimized FedCurv for personalized healthcare systems. FedCurv incorporates information about the importance of each parameter to each client's task (through Fisher Information Matrix) which helps to preserve client-specific knowledge and reduce model drift during aggregation. Moreover, it minimizes communication rounds required to achieve a target precision convergence for each edge client while effectively managing personalized training on non-iid and heterogeneous data. The incorporation of Ethereum-based model aggregation ensures trust, verifiability, and auditability while public key encryption enhances privacy and security. Experimental results of federated CNNs and MLPs utilizing Mnist, Cifar-10, and PathMnist demonstrate the high efficiency and scalability of the proposed framework.

ROMar 25, 2021

Multi Sensor Fusion for Navigation and Mapping in Autonomous Vehicles: Accurate Localization in Urban Environments

Li Qingqing, Jorge Peña Queralta, Tuan Nguyen Gia et al.

The combination of data from multiple sensors, also known as sensor fusion or data fusion, is a key aspect in the design of autonomous robots. In particular, algorithms able to accommodate sensor fusion techniques enable increased accuracy, and are more resilient against the malfunction of individual sensors. The development of algorithms for autonomous navigation, mapping and localization have seen big advancements over the past two decades. Nonetheless, challenges remain in developing robust solutions for accurate localization in dense urban environments, where the so called last-mile delivery occurs. In these scenarios, local motion estimation is combined with the matching of real-time data with a detailed pre-built map. In this paper, we utilize data gathered with an autonomous delivery robot to compare different sensor fusion techniques and evaluate which are the algorithms providing the highest accuracy depending on the environment. The techniques we analyze and propose in this paper utilize 3D lidar data, inertial data, GNSS data and wheel encoder readings. We show how lidar scan matching combined with other sensor data can be used to increase the accuracy of the robot localization and, in consequence, its navigation. Moreover, we propose a strategy to reduce the impact on navigation performance when a change in the environment renders map data invalid or part of the available map is corrupted.

SPApr 17, 2020

UWB-Based Localization for Multi-UAV Systems and Collaborative Heterogeneous Multi-Robot Systems: a Survey

Wang Shule, Carmen Martínez Almansa, Jorge Peña Queralta et al.

Ultra-wideband technology has emerged in recent years as a robust solution for localization in GNSS denied environments. In particular, its high accuracy when compared to other wireless localization solutions is enabling a wider range of collaborative and multi-robot application scenarios, being able to replace more complex and expensive motion-capture areas for use cases where accuracy in the order of tens of centimeters is sufficient. We present the first survey of UWB-based localization focused on multi-UAV systems and heterogeneous multi-robot systems. We have found that previous literature reviews do not consider in-depth the challenges in both aerial navigation and navigation with multiple robots, but also in terms of heterogeneous multi-robot systems. In particular, this is, to the best of our knowledge, the first survey to review recent advances in UWB-based (i) methods that enable ad-hoc and dynamic deployments; (ii) collaborative localization techniques; and (iii) cooperative sensing and cooperative maneuvers such as UAV docking on mobile platforms. Finally, we also review existing datasets and discuss the potential of this technology for both localization in GNSS-denied environments and collaboration in multi-robot systems.