Soham Pahari

LG
h-index4
3papers
Novelty67%
AI Score44

3 Papers

CVFeb 6
Learning Human Visual Attention on 3D Surfaces through Geometry-Queried Semantic Priors

Soham Pahari, Sandeep C. Kumain

Human visual attention on three-dimensional objects emerges from the interplay between bottom-up geometric processing and top-down semantic recognition. Existing 3D saliency methods rely on hand-crafted geometric features or learning-based approaches that lack semantic awareness, failing to explain why humans fixate on semantically meaningful but geometrically unremarkable regions. We introduce SemGeo-AttentionNet, a dual-stream architecture that explicitly formalizes this dichotomy through asymmetric cross-modal fusion, leveraging diffusion-based semantic priors from geometry-conditioned multi-view rendering and point cloud transformers for geometric processing. Cross-attention ensures geometric features query semantic content, enabling bottom-up distinctiveness to guide top-down retrieval. We extend our framework to temporal scanpath generation through reinforcement learning, introducing the first formulation respecting 3D mesh topology with inhibition-of-return dynamics. Evaluation on SAL3D, NUS3D and 3DVA datasets demonstrates substantial improvements, validating how cognitively motivated architectures effectively model human visual attention on three-dimensional surfaces.

LGDec 30, 2025
Lifting Vision: Ground to Aerial Localization with Reasoning Guided Planning

Soham Pahari, M. Srinivas

Multimodal intelligence development recently show strong progress in visual understanding and high level reasoning. Though, most reasoning system still reply on textual information as the main medium for inference. This limit their effectiveness in spatial tasks such as visual navigation and geo-localization. This work discuss about the potential scope of this field and eventually propose an idea visual reasoning paradigm Geo-Consistent Visual Planning, our introduced framework called Visual Reasoning for Localization, or ViReLoc, which performs planning and localization using only visual representations. The proposed framework learns spatial dependencies and geometric relations that text based reasoning often suffer to understand. By encoding step by step inference in the visual domain and optimizing with reinforcement based objectives, ViReLoc plans routes between two given ground images. The system also integrates contrastive learning and adaptive feature interaction to align cross view perspectives and reduce viewpoint differences. Experiments across diverse navigation and localization scenarios show consistent improvements in spatial reasoning accuracy and cross view retrieval performance. These results establish visual reasoning as a strong complementary approach for navigation and localization, and show that such tasks can be performed without real time global positioning system data, leading to more secure navigation solutions.

LGOct 26, 2025
Air Quality Prediction Using LOESS-ARIMA and Multi-Scale CNN-BiLSTM with Residual-Gated Attention

Soham Pahari, Sandeep Chand Kumain

Air pollution remains a critical environmental and public health concern in Indian megacities such as Delhi, Kolkata, and Mumbai, where sudden spikes in pollutant levels challenge timely intervention. Accurate Air Quality Index (AQI) forecasting is difficult due to the coexistence of linear trends, seasonal variations, and volatile nonlinear patterns. This paper proposes a hybrid forecasting framework that integrates LOESS decomposition, ARIMA modeling, and a multi-scale CNN-BiLSTM network with a residual-gated attention mechanism. The LOESS step separates the AQI series into trend, seasonal, and residual components, with ARIMA modeling the smooth components and the proposed deep learning module capturing multi-scale volatility in the residuals. Model hyperparameters are tuned via the Unified Adaptive Multi-Stage Metaheuristic Optimizer (UAMMO), combining multiple optimization strategies for efficient convergence. Experiments on 2021-2023 AQI datasets from the Central Pollution Control Board show that the proposed method consistently outperforms statistical, deep learning, and hybrid baselines across PM2.5, O3, CO, and NOx in three major cities, achieving up to 5-8% lower MSE and higher R^2 scores (>0.94) for all pollutants. These results demonstrate the framework's robustness, sensitivity to sudden pollution events, and applicability to urban air quality management.