IVMay 25
Diffusion Models for Hyperspectral Image Analysis: A Comprehensive ReviewXing Hu, Xiangcheng Liu, Qianqian Duan et al.
Hyperspectral image (HSI) analysis plays a critical role in remote sensing, agriculture, and environmental monitoring. However, traditional methods often struggle to handle the high dimensionality, spectral redundancy, and noise inherent in HSI data, limiting their accuracy and scalability. Recently, diffusion models including denoising diffusion probabilistic models and other generative frameworks based on stochastic differential equations have shown strong potential in capturing complex spectral spatial structures and generating high fidelity HSI data. These models offer effective solutions for tasks such as noise supression, data augmentation, classification, and anomaly detection. This review presents a systematic summary of recent advances in diffusion models for HSI processing. We categorize existing methods, highlight their strengths in handling high dimensional data, and compare their performance with conventional approaches. Special attention is given to critical applications such as change detection and post disaster anomaly identification. The review also discusses current limitations, such as computational cost and training stability, and outlines potential research directions. Our main contributions can be summarized as follows: we provide a systematic taxonomy of diffusion based HSI methods, examine their applications across major remote sensing tasks, and offer perspectives on potential directions for future research. With these efforts, this review seeks to support the community in harnessing deep learning models to achieve more effective and efficient hyperspectral image analysis.
CVJul 3, 2023
Cross-modal Place Recognition in Image Databases using Event-based SensorsXiang Ji, Jiaxin Wei, Yifu Wang et al.
Visual place recognition is an important problem towards global localization in many robotics tasks. One of the biggest challenges is that it may suffer from illumination or appearance changes in surrounding environments. Event cameras are interesting alternatives to frame-based sensors as their high dynamic range enables robust perception in difficult illumination conditions. However, current event-based place recognition methods only rely on event information, which restricts downstream applications of VPR. In this paper, we present the first cross-modal visual place recognition framework that is capable of retrieving regular images from a database given an event query. Our method demonstrates promising results with respect to the state-of-the-art frame-based and event-based methods on the Brisbane-Event-VPR dataset under different scenarios. We also verify the effectiveness of the combination of retrieval and classification, which can boost performance by a large margin.
CVNov 1, 2023
A Spatial-Temporal Transformer based Framework For Human Pose Assessment And Correction in Education ScenariosWenyang Hu, Kai Liu, Libin Liu et al.
Human pose assessment and correction play a crucial role in applications across various fields, including computer vision, robotics, sports analysis, healthcare, and entertainment. In this paper, we propose a Spatial-Temporal Transformer based Framework (STTF) for human pose assessment and correction in education scenarios such as physical exercises and science experiment. The framework comprising skeletal tracking, pose estimation, posture assessment, and posture correction modules to educate students with professional, quick-to-fix feedback. We also create a pose correction method to provide corrective feedback in the form of visual aids. We test the framework with our own dataset. It comprises (a) new recordings of five exercises, (b) existing recordings found on the internet of the same exercises, and (c) corrective feedback on the recordings by professional athletes and teachers. Results show that our model can effectively measure and comment on the quality of students' actions. The STTF leverages the power of transformer models to capture spatial and temporal dependencies in human poses, enabling accurate assessment and effective correction of students' movements.