IVJun 1
LALE: Lightweight-Transformer Architecture for Land-Cover EstimationÜmit Mert Çağlar, Alptekin Temizel
Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.
CVMar 10Code
Grounding Synthetic Data Generation With Vision and Language ModelsÜmit Mert Çağlar, Alptekin Temizel
Deep learning models benefit from increasing data diversity and volume, motivating synthetic data augmentation to improve existing datasets. However, existing evaluation metrics for synthetic data typically calculate latent feature similarity, which is difficult to interpret and does not always correlate with the contribution to downstream tasks. We propose a vision-language grounded framework for interpretable synthetic data augmentation and evaluation in remote sensing. Our approach combines generative models, semantic segmentation and image captioning with vision and language models. Based on this framework, we introduce ARAS400k: A large-scale Remote sensing dataset Augmented with Synthetic data for segmentation and captioning, containing 100k real images and 300k synthetic images, each paired with segmentation maps and descriptions. ARAS400k enables the automated evaluation of synthetic data by analyzing semantic composition, minimizing caption redundancy, and verifying cross-modal consistency between visual structures and language descriptions. Experimental results indicate that while models trained exclusively on synthetic data reach competitive performance levels, those trained with augmented data (a combination of real and synthetic images) consistently outperform real-data baselines. Consequently, this work establishes a scalable benchmark for remote sensing tasks, specifically in semantic segmentation and image captioning. The dataset is available at zenodo.org/records/18890661 and the code base at github.com/caglarmert/ARAS400k.
IVNov 10, 2023
Ulcerative Colitis Mayo Endoscopic Scoring Classification with Active Learning and Generative Data AugmentationÜmit Mert Çağlar, Alperen İnci, Oğuz Hanoğlu et al.
Endoscopic imaging is commonly used to diagnose Ulcerative Colitis (UC) and classify its severity. It has been shown that deep learning based methods are effective in automated analysis of these images and can potentially be used to aid medical doctors. Unleashing the full potential of these methods depends on the availability of large amount of labeled images; however, obtaining and labeling these images are quite challenging. In this paper, we propose a active learning based generative augmentation method. The method involves generating a large number of synthetic samples by training using a small dataset consisting of real endoscopic images. The resulting data pool is narrowed down by using active learning methods to select the most informative samples, which are then used to train a classifier. We demonstrate the effectiveness of our method through experiments on a publicly available endoscopic image dataset. The results show that using synthesized samples in conjunction with active learning leads to improved classification performance compared to using only the original labeled examples and the baseline classification performance of 68.1% increases to 74.5% in terms of Quadratic Weighted Kappa (QWK) Score. Another observation is that, attaining equivalent performance using only real data necessitated three times higher number of images.
CVDec 2, 2024
Class Distance Weighted Cross Entropy Loss for Classification of Disease SeverityGorkem Polat, Ümit Mert Çağlar, Alptekin Temizel
Assessing disease severity with ordinal classes, where each class reflects increasing severity levels, benefits from loss functions designed for this ordinal structure. Traditional categorical loss functions, like Cross-Entropy (CE), often perform suboptimally in these scenarios. To address this, we propose a novel loss function, Class Distance Weighted Cross-Entropy (CDW-CE), which penalizes misclassifications more severely when the predicted and actual classes are farther apart. We evaluated CDW-CE using various deep architectures, comparing its performance against several categorical and ordinal loss functions. To assess the quality of latent representations, we used t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) visualizations, quantified the clustering quality using the Silhouette Score, and compared Class Activation Maps (CAM) generated by models trained with CDW-CE and CE loss. Feedback from domain experts was incorporated to evaluate how well model attention aligns with expert opinion. Our results show that CDW-CE consistently improves performance in ordinal image classification tasks. It achieves higher Silhouette Scores, indicating better class discrimination capability, and its CAM visualizations show a stronger focus on clinically significant regions, as validated by domain experts. Receiver operator characteristics (ROC) curves and the area under the curve (AUC) scores highlight that CDW-CE outperforms other loss functions, including prominent ordinal loss functions from the literature.
SPMar 12, 2024
Exploring Challenges in Deep Learning of Single-Station Ground Motion RecordsÜmit Mert Çağlar, Baris Yilmaz, Melek Türkmen et al.
Contemporary deep learning models have demonstrated promising results across various applications within seismology and earthquake engineering. These models rely primarily on utilizing ground motion records for tasks such as earthquake event classification, localization, earthquake early warning systems, and structural health monitoring. However, the extent to which these models truly extract meaningful patterns from these complex time-series signals remains underexplored. In this study, our objective is to evaluate the degree to which auxiliary information, such as seismic phase arrival times or seismic station distribution within a network, dominates the process of deep learning from ground motion records, potentially hindering its effectiveness. Our experimental results reveal a strong dependence on the highly correlated Primary (P) and Secondary (S) phase arrival times. These findings expose a critical gap in the current research landscape, highlighting the lack of robust methodologies for deep learning from single-station ground motion recordings that do not rely on auxiliary inputs.