CVApr 21, 2022
GAF-NAU: Gramian Angular Field encoded Neighborhood Attention U-Net for Pixel-Wise Hyperspectral Image ClassificationSidike Paheding, Abel A. Reyes, Anush Kasaragod et al.
Hyperspectral image (HSI) classification is the most vibrant area of research in the hyperspectral community due to the rich spectral information contained in HSI can greatly aid in identifying objects of interest. However, inherent non-linearity between materials and the corresponding spectral profiles brings two major challenges in HSI classification: interclass similarity and intraclass variability. Many advanced deep learning methods have attempted to address these issues from the perspective of a region/patch-based approach, instead of a pixel-based alternate. However, the patch-based approaches hypothesize that neighborhood pixels of a target pixel in a fixed spatial window belong to the same class. And this assumption is not always true. To address this problem, we herein propose a new deep learning architecture, namely Gramian Angular Field encoded Neighborhood Attention U-Net (GAF-NAU), for pixel-based HSI classification. The proposed method does not require regions or patches centered around a raw target pixel to perform 2D-CNN based classification, instead, our approach transforms 1D pixel vector in HSI into 2D angular feature space using Gramian Angular Field (GAF) and then embed it to a new neighborhood attention network to suppress irrelevant angular feature while emphasizing on pertinent features useful for HSI classification task. Evaluation results on three publicly available HSI datasets demonstrate the superior performance of the proposed model.
CVJul 1, 2023
Forward-Forward Algorithm for Hyperspectral Image Classification: A Preliminary StudySidike Paheding, Abel A. Reyes-Angulo
The back-propagation algorithm has long been the de-facto standard in optimizing weights and biases in neural networks, particularly in cutting-edge deep learning models. Its widespread adoption in fields like natural language processing, computer vision, and remote sensing has revolutionized automation in various tasks. The popularity of back-propagation stems from its ability to achieve outstanding performance in tasks such as classification, detection, and segmentation. Nevertheless, back-propagation is not without its limitations, encompassing sensitivity to initial conditions, vanishing gradients, overfitting, and computational complexity. The recent introduction of a forward-forward algorithm (FFA), which computes local goodness functions to optimize network parameters, alleviates the dependence on substantial computational resources and the constant need for architectural scaling. This study investigates the application of FFA for hyperspectral image classification. Experimental results and comparative analysis are provided with the use of the traditional back-propagation algorithm. Preliminary results show the potential behind FFA and its promises.
CVJul 2, 2023
The Forward-Forward Algorithm as a feature extractor for skin lesion classification: A preliminary studyAbel Reyes-Angulo, Sidike Paheding
Skin cancer, a deadly form of cancer, exhibits a 23\% survival rate in the USA with late diagnosis. Early detection can significantly increase the survival rate, and facilitate timely treatment. Accurate biomedical image classification is vital in medical analysis, aiding clinicians in disease diagnosis and treatment. Deep learning (DL) techniques, such as convolutional neural networks and transformers, have revolutionized clinical decision-making automation. However, computational cost and hardware constraints limit the implementation of state-of-the-art DL architectures. In this work, we explore a new type of neural network that does not need backpropagation (BP), namely the Forward-Forward Algorithm (FFA), for skin lesion classification. While FFA is claimed to use very low-power analog hardware, BP still tends to be superior in terms of classification accuracy. In addition, our experimental results suggest that the combination of FFA and BP can be a better alternative to achieve a more accurate prediction.
ROSep 10, 2023
What Is Near?: Room Locality Learning for Enhanced Robot Vision-Language-Navigation in Indoor Living EnvironmentsMuraleekrishna Gopinathan, Jumana Abu-Khalaf, David Suter et al.
Humans use their knowledge of common house layouts obtained from previous experiences to predict nearby rooms while navigating in new environments. This greatly helps them navigate previously unseen environments and locate their target room. To provide layout prior knowledge to navigational agents based on common human living spaces, we propose WIN (\textit{W}hat \textit{I}s \textit{N}ear), a commonsense learning model for Vision Language Navigation (VLN) tasks. VLN requires an agent to traverse indoor environments based on descriptive navigational instructions. Unlike existing layout learning works, WIN predicts the local neighborhood map based on prior knowledge of living spaces and current observation, operating on an imagined global map of the entire environment. The model infers neighborhood regions based on visual cues of current observations, navigational history, and layout common sense. We show that local-global planning based on locality knowledge and predicting the indoor layout allows the agent to efficiently select the appropriate action. Specifically, we devised a cross-modal transformer that utilizes this locality prior for decision-making in addition to visual inputs and instructions. Experimental results show that locality learning using WIN provides better generalizability compared to classical VLN agents in unseen environments. Our model performs favorably on standard VLN metrics, with Success Rate 68\% and Success weighted by Path Length 63\% in unseen environments.
46.8LGApr 2
Self-Directed Task IdentificationTimothy Gould, Sidike Paheding
In this work, we present a novel machine learning framework called Self-Directed Task Identification (SDTI), which enables models to autonomously identify the correct target variable for each dataset in a zero-shot setting without pre-training. SDTI is a minimal, interpretable framework demonstrating the feasibility of repurposing core machine learning concepts for a novel task structure. To our knowledge, no existing architectures have demonstrated this ability. Traditional approaches lack this capability, leaving data annotation as a time-consuming process that relies heavily on human effort. Using only standard neural network components, we show that SDTI can be achieved through appropriate problem formulation and architectural design. We evaluate the proposed framework on a range of benchmark tasks and demonstrate its effectiveness in reliably identifying the ground truth out of a set of potential target variables. SDTI outperformed baseline architectures by 14% in F1 score on synthetic task identification benchmarks. These proof-of-concept experiments highlight the future potential of SDTI to reduce dependence on manual annotation and to enhance the scalability of autonomous learning systems in real-world applications.
IVNov 21, 2024Code
Beneath the Surface: The Role of Underwater Image Enhancement in Object DetectionAli Awad, Ashraf Saleem, Sidike Paheding et al.
Underwater imagery often suffers from severe degradation resulting in low visual quality and reduced object detection performance. This work aims to evaluate state-of-the-art image enhancement models, investigate their effects on underwater object detection, and explore their potential to improve detection performance. To this end, we apply nine recent underwater image enhancement models, covering physical, non-physical and learning-based categories, to two recent underwater image datasets. Following this, we conduct joint qualitative and quantitative analyses on the original and enhanced images, revealing the discrepancy between the two analyses, and analyzing changes in the quality distribution of the images after enhancement. We then train three recent object detection models on the original datasets, selecting the best-performing detector for further analysis. This detector is subsequently re-trained on the enhanced datasets to evaluate changes in detection performance, highlighting the adverse effect of enhancement on detection performance at the dataset level. Next, we perform a correlation study to examine the relationship between various enhancement metrics and the mean Average Precision (mAP). Finally, we conduct an image-level analysis that reveals images of improved detection performance after enhancement. The findings of this study demonstrate the potential of image enhancement to improve detection performance and provide valuable insights for researchers to further explore the effects of enhancement on detection at the individual image level, rather than at the dataset level. This could enable the selective application of enhancement for improved detection. The data generated, code developed, and supplementary materials are publicly available at: https://github.com/RSSL-MTU/Enhancement-Detection-Analysis.
LGMar 25, 2025
Forecasting Volcanic Radiative Power (VPR) at Fuego Volcano Using Bayesian Regularized Neural NetworkSnehamoy Chatterjee, Greg Waite, Sidike Paheding et al.
Forecasting volcanic activity is critical for hazard assessment and risk mitigation. Volcanic Radiative Power (VPR), derived from thermal remote sensing data, serves as an essential indicator of volcanic activity. In this study, we employ Bayesian Regularized Neural Networks (BRNN) to predict future VPR values based on historical data from Fuego Volcano, comparing its performance against Scaled Conjugate Gradient (SCG) and Levenberg-Marquardt (LM) models. The results indicate that BRNN outperforms SCG and LM, achieving the lowest mean squared error (1.77E+16) and the highest R-squared value (0.50), demonstrating its superior ability to capture VPR variability while minimizing overfitting. Despite these promising results, challenges remain in improving the model's predictive accuracy. Future research should focus on integrating additional geophysical parameters, such as seismic and gas emission data, to enhance forecasting precision. The findings highlight the potential of machine learning models, particularly BRNN, in advancing volcanic activity forecasting, contributing to more effective early warning systems for volcanic hazards.
CVJun 14, 2024
Cross-view geo-localization: a surveyAbhilash Durgam, Sidike Paheding, Vikas Dhiman et al.
Cross-view geo-localization has garnered notable attention in the realm of computer vision, spurred by the widespread availability of copious geotagged datasets and the advancements in machine learning techniques. This paper provides a thorough survey of cutting-edge methodologies, techniques, and associated challenges that are integral to this domain, with a focus on feature-based and deep learning strategies. Feature-based methods capitalize on unique features to establish correspondences across disparate viewpoints, whereas deep learning-based methodologies deploy convolutional neural networks to embed view-invariant attributes. This work also delineates the multifaceted challenges encountered in cross-view geo-localization, such as variations in viewpoints and illumination, the occurrence of occlusions, and it elucidates innovative solutions that have been formulated to tackle these issues. Furthermore, we delineate benchmark datasets and relevant evaluation metrics, and also perform a comparative analysis of state-of-the-art techniques. Finally, we conclude the paper with a discussion on prospective avenues for future research and the burgeoning applications of cross-view geo-localization in an intricately interconnected global landscape.