CVFeb 4Code
Mitigating Long-Tail Bias via Prompt-Controlled Diffusion AugmentationBuddhi Wijenayake, Nichula Wasalathilake, Roshan Godaliyadda et al.
Semantic segmentation of high-resolution remote-sensing imagery is critical for urban mapping and land-cover monitoring, yet training data typically exhibits severe long-tailed pixel imbalance. In the dataset LoveDA, this challenge is compounded by an explicit Urban/Rural split with distinct appearance and inconsistent class-frequency statistics across domains. We present a prompt-controlled diffusion augmentation framework that synthesizes paired label--image samples with explicit control of both domain and semantic composition. Stage~A uses a domain-aware, masked ratio-conditioned discrete diffusion model to generate layouts that satisfy user-specified class-ratio targets while respecting learned co-occurrence structure. Stage~B translates layouts into photorealistic, domain-consistent images using Stable Diffusion with ControlNet guidance. Mixing the resulting ratio and domain-controlled synthetic pairs with real data yields consistent improvements across multiple segmentation backbones, with gains concentrated on minority classes and improved Urban and Rural generalization, demonstrating controllable augmentation as a practical mechanism to mitigate long-tail bias in remote-sensing segmentation. Source codes, pretrained models, and synthetic datasets are available at \href{https://github.com/Buddhi19/SyntheticGen.git}{Github}
CVApr 16, 2022
GAUSS: Guided Encoder-Decoder Architecture for Hyperspectral Unmixing with Spatial SmoothnessYasiru Ranasinghe, Kavinga Weerasooriya, Roshan Godaliyadda et al.
In recent hyperspectral unmixing (HU) literature, the application of deep learning (DL) has become more prominent, especially with the autoencoder (AE) architecture. We propose a split architecture and use a pseudo-ground truth for abundances to guide the `unmixing network' (UN) optimization. Preceding the UN, an `approximation network' (AN) is proposed, which will improve the association between the centre pixel and its neighbourhood. Hence, it will accentuate spatial correlation in the abundances as its output is the input to the UN and the reference for the `mixing network' (MN). In the Guided Encoder-Decoder Architecture for Hyperspectral Unmixing with Spatial Smoothness (GAUSS), we proposed using one-hot encoded abundances as the pseudo-ground truth to guide the UN; computed using the k-means algorithm to exclude the use of prior HU methods. Furthermore, we release the single-layer constraint on MN by introducing the UN generated abundances in contrast to the standard AE for HU. Secondly, we experimented with two modifications on the pre-trained network using the GAUSS method. In GAUSS$_\textit{blind}$, we have concatenated the UN and the MN to back-propagate the reconstruction error gradients to the encoder. Then, in the GAUSS$_\textit{prime}$, abundance results of a signal processing (SP) method with reliable abundance results were used as the pseudo-ground truth with the GAUSS architecture. According to quantitative and graphical results for four experimental datasets, the three architectures either transcended or equated the performance of existing HU algorithms from both DL and SP domains.
4.6IVApr 20
A Controlled Benchmark of Visual State-Space Backbones with Domain-Shift and Boundary Analysis for Remote-Sensing SegmentationNichula Wasalathilaka, Dineth Perera, Oshadha Samarakoon et al.
Visual state-space models (SSMs) are increasingly promoted as efficient alternatives to Vision Transformers, yet their practical advantages remain unclear under fair comparison because existing studies rarely isolate encoder effects from decoder and training choices. We present a strictly controlled benchmark of representative visual SSM families, including VMamba, MambaVision, and Spatial-Mamba, for remote-sensing semantic segmentation, in which only the encoder varies across experiments. Evaluated on LoveDA and ISPRS Potsdam under a unified 4-stage feature interface and a fixed lightweight decoder, the benchmark reveals three main findings, intra-family scaling yields only modest gains, cross-domain generalization is strongly asymmetric, and boundary delineation is the dominant failure mode under distribution shift. Although visual SSMs achieve favorable accuracy-efficiency trade-offs relative to the controlled CNN and Transformer baselines considered here, the results suggest that future improvements are more likely to come from robustness-oriented design and boundary-aware decoding than from encoder scaling alone. By isolating encoder behavior under a unified and reproducible protocol, this study establishes a practical reference benchmark for the design and evaluation of future Mamba-based segmentation backbones
IVAug 11, 2025
Preprocessing Algorithm Leveraging Geometric Modeling for Scale Correction in Hyperspectral Images for Improved Unmixing PerformancePraveen Sumanasekara, Athulya Ratnayake, Buddhi Wijenayake et al.
Spectral variability significantly impacts the accuracy and convergence of hyperspectral unmixing algorithms. Many methods address complex spectral variability; yet large-scale distortions to the scale of the observed pixel signatures due to topography, illumination, and shadowing remain a major challenge. These variations often degrade unmixing performance and complicate model fitting. Because of this, correcting these variations can offer significant advantages in real-world GIS applications. In this paper, we propose a novel preprocessing algorithm that corrects scale-induced spectral variability prior to unmixing. By estimating and correcting these distortions to the scale of the pixel signatures, the algorithm produces pixel signatures with minimal distortions in scale. Since these distortions in scale (which hinder the performance of many unmixing methods) are greatly minimized in the output provided by the proposed method, the abundance estimation of the unmixing algorithms is significantly improved. We present a rigorous mathematical framework to describe and correct for scale variability and provide extensive experimental validation of the proposed algorithm. Furthermore, the algorithm's impact is evaluated across a wide range of state-of-the-art unmixing methods on two synthetic and two real hyperspectral datasets. The proposed preprocessing step consistently improves the performance of these algorithms, achieving error reductions of around 50%, even for algorithms specifically designed to handle spectral variability. This demonstrates that scale correction acts as a complementary step, facilitating more accurate unmixing with existing methods. The algorithm's generality, consistent impact, and significant influence highlight its potential as a key component in practical hyperspectral unmixing pipelines. The implementation code will be made publicly available upon publication.
CVDec 13, 2021
Holistic Interpretation of Public Scenes Using Computer Vision and Temporal Graphs to Identify Social Distancing ViolationsGihan Jayatilaka, Jameel Hassan, Suren Sritharan et al.
The COVID-19 pandemic has caused an unprecedented global public health crisis. Given its inherent nature, social distancing measures are proposed as the primary strategies to curb the spread of this pandemic. Therefore, identifying situations where these protocols are violated, has implications for curtailing the spread of the disease and promoting a sustainable lifestyle. This paper proposes a novel computer vision-based system to analyze CCTV footage to provide a threat level assessment of COVID-19 spread. The system strives to holistically capture and interpret the information content of CCTV footage spanning multiple frames to recognize instances of various violations of social distancing protocols, across time and space, as well as identification of group behaviors. This functionality is achieved primarily by utilizing a temporal graph-based structure to represent the information of the CCTV footage and a strategy to holistically interpret the graph and quantify the threat level of the given scene. The individual components are tested and validated on a range of scenarios and the complete system is tested against human expert opinion. The results reflect the dependence of the threat level on people, their physical proximity, interactions, protective clothing, and group dynamics. The system performance has an accuracy of 76%, thus enabling a deployable threat monitoring system in cities, to permit normalcy and sustainability in the society.
MED-PHNov 19, 2021
Assessment of Fetal and Maternal Well-Being During Pregnancy Using Passive Wearable Inertial SensorEranda Somathilake, Upekha Delay, Janith Bandara Senanayaka et al.
Assessing the health of both the fetus and mother is vital in preventing and identifying possible complications in pregnancy. This paper focuses on a device that can be used effectively by the mother herself with minimal supervision and provide a reasonable estimation of fetal and maternal health while being safe, comfortable, and easy to use. The device proposed uses a belt with a single accelerometer over the mother's uterus to record the required information. The device is expected to monitor both the mother and the fetus constantly over a long period and provide medical professionals with useful information, which they would otherwise overlook due to the low frequency that health monitoring is carried out at the present. The paper shows that simultaneous measurement of respiratory information of the mother and fetal movement is in fact possible even in the presence of mild interferences, which needs to be accounted for if the device is expected to be worn for extended times.
APAug 21, 2021
A generalized forecasting solution to enable future insights of COVID-19 at sub-national level resolutionsUmar Marikkar, Harshana Weligampola, Rumali Perera et al.
COVID-19 continues to cause a significant impact on public health. To minimize this impact, policy makers undertake containment measures that however, when carried out disproportionately to the actual threat, as a result if errorneous threat assessment, cause undesirable long-term socio-economic complications. In addition, macro-level or national level decision making fails to consider the localized sensitivities in small regions. Hence, the need arises for region-wise threat assessments that provide insights on the behaviour of COVID-19 through time, enabled through accurate forecasts. In this study, a forecasting solution is proposed, to predict daily new cases of COVID-19 in regions small enough where containment measures could be locally implemented, by targeting three main shortcomings that exist in literature; the unreliability of existing data caused by inconsistent testing patterns in smaller regions, weak deploy-ability of forecasting models towards predicting cases in previously unseen regions, and model training biases caused by the imbalanced nature of data in COVID-19 epi-curves. Hence, the contributions of this study are three-fold; an optimized smoothing technique to smoothen less deterministic epi-curves based on epidemiological dynamics of that region, a Long-Short-Term-Memory (LSTM) based forecasting model trained using data from select regions to create a representative and diverse training set that maximizes deploy-ability in regions with lack of historical data, and an adaptive loss function whilst training to mitigate the data imbalances seen in epi-curves. The proposed smoothing technique, the generalized training strategy and the adaptive loss function largely increased the overall accuracy of the forecast, which enables efficient containment measures at a more localized micro-level.
CVMay 21, 2021
An Optical physics inspired CNN approach for intrinsic image decompositionHarshana Weligampola, Gihan Jayatilaka, Suren Sritharan et al.
Intrinsic Image Decomposition is an open problem of generating the constituents of an image. Generating reflectance and shading from a single image is a challenging task specifically when there is no ground truth. There is a lack of unsupervised learning approaches for decomposing an image into reflectance and shading using a single image. We propose a neural network architecture capable of this decomposition using physics-based parameters derived from the image. Through experimental results, we show that (a) the proposed methodology outperforms the existing deep learning-based IID techniques and (b) the derived parameters improve the efficacy significantly. We conclude with a closer analysis of the results (numerical and example images) showing several avenues for improvement.
CVNov 18, 2020
Convolutional Autoencoder for Blind Hyperspectral Image UnmixingYasiru Ranasinghe, Sanjaya Herath, Kavinga Weerasooriya et al.
In the remote sensing context spectral unmixing is a technique to decompose a mixed pixel into two fundamental representatives: endmembers and abundances. In this paper, a novel architecture is proposed to perform blind unmixing on hyperspectral images. The proposed architecture consists of convolutional layers followed by an autoencoder. The encoder transforms the feature space produced through convolutional layers to a latent space representation. Then, from these latent characteristics the decoder reconstructs the roll-out image of the monochrome image which is at the input of the architecture; and each single-band image is fed sequentially. Experimental results on real hyperspectral data concludes that the proposed algorithm outperforms existing unmixing methods at abundance estimation and generates competitive results for endmember extraction with RMSE and SAD as the metrics, respectively.