Muhammad Shahzad

h-index48

20papers

2,048citations

Novelty37%

AI Score48

Ranked #54,554 of 201,326 authors (top 27%)#20,201 in CV (top 34%)

20 Papers

IRMay 9

Global-local Spatial-temporal Aware Graph Attention Network for Network Traffic Forecasting

Jinming Xing, Guoheng Sun, Hui Sun et al.

Spatial-temporal network traffic forecasting is a challenging task due to the complex spatial relationships and dynamic temporal patterns present in each node. Traditional regression methods are not directly applicable to such graph data. Recently, Graph Neural Networks (GNNs) have been widely used to model spatial-temporal dependencies. However, existing methods face several limitations: (1) They rely solely on a predefined spatial adjacency matrix, overlooking hidden low-level temporal information. (2) They model spatial and temporal information separately, which inevitably leads to a loss of joint dependencies, or they capture only global or local dependencies. To address these issues, we propose the \textbf{G}lobal-\textbf{L}ocal \textbf{S}patial-\textbf{T}emporal \textbf{a}ware \textbf{G}raph \textbf{AT}tention Network (GLSTaGAT). Specifically, we adopt a data-driven spatial-temporal fusion graph that incorporates low-level spatial and temporal information, serving as the foundation for further graph convolutions. The GLSTaGAT block and its pooling variant are proposed to simultaneously capture local and global spatial-temporal dependencies. Additionally, we introduce node normalization to mitigate covariance shifts, enabling a smoother training process. An encoder-only transformer is utilized to model high-level joint dependencies, and a multi-head attention prediction layer is designed for final information aggregation and prediction. Experimental results on real-world datasets demonstrate that GLSTaGAT outperforms the baselines by 32.14\% (MAE), 28.30\% (RMSE), and 20.47\% (SMAPE) on average.

CVAug 26, 2024Code

NimbleD: Enhancing Self-supervised Monocular Depth Estimation with Pseudo-labels and Large-scale Video Pre-training

Albert Luginov, Muhammad Shahzad

We introduce NimbleD, an efficient self-supervised monocular depth estimation learning framework that incorporates supervision from pseudo-labels generated by a large vision model. This framework does not require camera intrinsics, enabling large-scale pre-training on publicly available videos. Our straightforward yet effective learning strategy significantly enhances the performance of fast and lightweight models without introducing any overhead, allowing them to achieve performance comparable to state-of-the-art self-supervised monocular depth estimation models. This advancement is particularly beneficial for virtual and augmented reality applications requiring low latency inference. The source code, model weights, and acknowledgments are available at https://github.com/xapaxca/nimbled .

CVJul 25, 2022

Deep dual stream residual network with contextual attention for pansharpening of remote sensing images

Syeda Roshana Ali, Anis Ur Rahman, Muhammad Shahzad

Pansharpening enhances spatial details of high spectral resolution multispectral images using features of high spatial resolution panchromatic image. There are a number of traditional pansharpening approaches but producing an image exhibiting high spectral and spatial fidelity is still an open problem. Recently, deep learning has been used to produce promising pansharpened images; however, most of these approaches apply similar treatment to both multispectral and panchromatic images by using the same network for feature extraction. In this work, we present present a novel dual attention-based two-stream network. It starts with feature extraction using two separate networks for both images, an encoder with attention mechanism to recalibrate the extracted features. This is followed by fusion of the features forming a compact representation fed into an image reconstruction network to produce a pansharpened image. The experimental results on the Pléiades dataset using standard quantitative evaluation metrics and visual inspection demonstrates that the proposed approach performs better than other approaches in terms of pansharpened image quality.

IRMar 20

CO-EVOLVE: Bidirectional Co-Evolution of Graph Structure and Semantics for Heterophilous Learning

Jinming Xing, Muhammad Shahzad

The integration of Large Language Models (LLMs) and Graph Neural Networks (GNNs) promises to unify semantic understanding with structural reasoning, yet existing methods typically rely on static, unidirectional pipelines. These approaches suffer from fundamental limitations: (1) Bidirectional Error Propagation, where semantic hallucinations in LLMs or structural noise in GNNs permanently poison the downstream modality without opportunity for recourse; (2) Semantic-Structural Dissonance, particularly in heterophilous settings where textual similarity contradicts topological reality; (3) a Blind Leading the Blind phenomenon, where indiscriminate alignment forces models to mirror each other's mistakes regardless of uncertainty. To address these challenges, we propose CO-EVOLVE, a dual-view co-evolution framework that treats graph topology and semantic embeddings as dynamic, mutually reinforcing latent variables. By employing a Gauss-Seidel alternating optimization strategy, our framework establishes a cyclic feedback loop: the GNN injects structural context as Soft Prompts to guide the LLM, while the LLM constructs favorable Dynamic Semantic Graphs to rewire the GNN. We introduce three key innovations to stabilize this evolution: (1) a Hard-Structure Conflict-Aware Contrastive Loss that warps the semantic manifold to respect high-order topological boundaries; (2) an Adaptive Node Gating Mechanism that dynamically fuses static and learnable structures to recover missing links; (3) an Uncertainty-Gated Consistency strategy that enables meta-cognitive alignment, ensuring models only learn from the confident view. Finally, an Entropy-Aware Adaptive Fusion integrates predictions during inference. Extensive experiments on public benchmarks demonstrate that CO-EVOLVE significantly outperforms state-of-the-art baselines, achieving average improvements of 9.07% in Accuracy and 7.19% in F1-score.

CVApr 25, 2024

The Third Monocular Depth Estimation Challenge

Jaime Spencer, Fabio Tosi, Matteo Poggi et al.

This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 submissions outperforming the baseline on the test set: 10 among them submitted a report describing their approach, highlighting a diffused use of foundational models such as Depth Anything at the core of their method. The challenge winners drastically improved 3D F-Score performance, from 17.51% to 23.72%.

LGMar 7, 2025

MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification

Yang Mu, Muhammad Shahzad, Xiao Xiang Zhu

Multivariate Time Series Classification (MTSC) is crucial in extensive practical applications, such as environmental monitoring, medical EEG analysis, and action recognition. Real-world time series datasets typically exhibit complex dynamics. To capture this complexity, RNN-based, CNN-based, Transformer-based, and hybrid models have been proposed. Unfortunately, current deep learning-based methods often neglect the simultaneous construction of local features and global dependencies at different time scales, lacking sufficient feature extraction capabilities to achieve satisfactory classification accuracy. To address these challenges, we propose a novel Multiscale Periodic Time Series Network (MPTSNet), which integrates multiscale local patterns and global correlations to fully exploit the inherent information in time series. Recognizing the multi-periodicity and complex variable correlations in time series, we use the Fourier transform to extract primary periods, enabling us to decompose data into multiscale periodic segments. Leveraging the inherent strengths of CNN and attention mechanism, we introduce the PeriodicBlock, which adaptively captures local patterns and global dependencies while offering enhanced interpretability through attention integration across different periodic scales. The experiments on UEA benchmark datasets demonstrate that the proposed MPTSNet outperforms 21 existing advanced baselines in the MTSC tasks.

CVDec 24, 2024

Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation

Faraz Waseem, Muhammad Shahzad

An image may convey a thousand words, but a video composed of hundreds or thousands of image frames tells a more intricate story. Despite significant progress in multimodal large language models (MLLMs), generating extended videos remains a formidable challenge. As of this writing, OpenAI's Sora, the current state-of-the-art system, is still limited to producing videos that are up to one minute in length. This limitation stems from the complexity of long video generation, which requires more than generative AI techniques for approximating density functions essential aspects such as planning, story development, and maintaining spatial and temporal consistency present additional hurdles. Integrating generative AI with a divide-and-conquer approach could improve scalability for longer videos while offering greater control. In this survey, we examine the current landscape of long video generation, covering foundational techniques like GANs and diffusion models, video generation strategies, large-scale training datasets, quality metrics for evaluating long videos, and future research areas to address the limitations of the existing video generation capabilities. We believe it would serve as a comprehensive foundation, offering extensive information to guide future advancements and research in the field of long video generation.

CVApr 24, 2025

The Fourth Monocular Depth Estimation Challenge

Anton Obukhov, Matteo Poggi, Fabio Tosi et al.

This paper presents the results of the fourth edition of the Monocular Depth Estimation Challenge (MDEC), which focuses on zero-shot generalization to the SYNS-Patches benchmark, a dataset featuring challenging environments in both natural and indoor settings. In this edition, we revised the evaluation protocol to use least-squares alignment with two degrees of freedom to support disparity and affine-invariant predictions. We also revised the baselines and included popular off-the-shelf methods: Depth Anything v2 and Marigold. The challenge received a total of 24 submissions that outperformed the baselines on the test set; 10 of these included a report describing their approach, with most leading methods relying on affine-invariant predictions. The challenge winners improved the 3D F-Score over the previous edition's best result, raising it from 22.58% to 23.05%.

LGSep 2, 2025

Structured Basis Function Networks: Loss-Centric Multi-Hypothesis Ensembles with Controllable Diversity

Alejandro Rodriguez Dominguez, Muhammad Shahzad, Xia Hong

Existing approaches to predictive uncertainty rely either on multi-hypothesis prediction, which promotes diversity but lacks principled aggregation, or on ensemble learning, which improves accuracy but rarely captures the structured ambiguity. This implicitly means that a unified framework consistent with the loss geometry remains absent. The Structured Basis Function Network addresses this gap by linking multi-hypothesis prediction and ensembling through centroidal aggregation induced by Bregman divergences. The formulation applies across regression and classification by aligning predictions with the geometry of the loss, and supports both a closed-form least-squares estimator and a gradient-based procedure for general objectives. A tunable diversity mechanism provides parametric control of the bias-variance-diversity trade-off, connecting multi-hypothesis generalisation with loss-aware ensemble aggregation. Experiments validate this relation and use the mechanism to study the complexity-capacity-diversity trade-off across datasets of increasing difficulty with deep-learning predictors.

CVMay 18, 2025

GlobalGeoTree: A Multi-Granular Vision-Language Dataset for Global Tree Species Classification

Yang Mu, Zhitong Xiong, Yi Wang et al.

Global tree species mapping using remote sensing data is vital for biodiversity monitoring, forest management, and ecological research. However, progress in this field has been constrained by the scarcity of large-scale, labeled datasets. To address this, we introduce GlobalGeoTree, a comprehensive global dataset for tree species classification. GlobalGeoTree comprises 6.3 million geolocated tree occurrences, spanning 275 families, 2,734 genera, and 21,001 species across the hierarchical taxonomic levels. Each sample is paired with Sentinel-2 image time series and 27 auxiliary environmental variables, encompassing bioclimatic, geographic, and soil data. The dataset is partitioned into GlobalGeoTree-6M for model pretraining and curated evaluation subsets, primarily GlobalGeoTree-10kEval for zero-shot and few-shot benchmarking. To demonstrate the utility of the dataset, we introduce a baseline model, GeoTreeCLIP, which leverages paired remote sensing data and taxonomic text labels within a vision-language framework pretrained on GlobalGeoTree-6M. Experimental results show that GeoTreeCLIP achieves substantial improvements in zero- and few-shot classification on GlobalGeoTree-10kEval over existing advanced models. By making the dataset, models, and code publicly available, we aim to establish a benchmark to advance tree species classification and foster innovation in biodiversity research and ecological applications.

LGDec 9, 2024

How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning

Yuanyuan Wang, Qian Song, Dawood Wasif et al.

Uncertainty quantification (UQ) is essential for assessing the reliability of Earth observation (EO) products. However, the extensive use of machine learning models in EO introduces an additional layer of complexity, as those models themselves are inherently uncertain. While various UQ methods do exist for machine learning models, their performance on EO datasets remains largely unevaluated. A key challenge in the community is the absence of the ground truth for uncertainty, i.e. how certain the uncertainty estimates are, apart from the labels for the image/signal. This article fills this gap by introducing three benchmark datasets specifically designed for UQ in EO machine learning models. These datasets address three common problem types in EO: regression, image segmentation, and scene classification. They enable a transparent comparison of different UQ methods for EO machine learning models. We describe the creation and characteristics of each dataset, including data sources, preprocessing steps, and label generation, with a particular focus on calculating the reference uncertainty. We also showcase baseline performance of several machine learning models on each dataset, highlighting the utility of these benchmarks for model development and comparison. Overall, this article offers a valuable resource for researchers and practitioners working in artificial intelligence for EO, promoting a more accurate and reliable quality measure of the outputs of machine learning models. The dataset and code are accessible via https://gitlab.lrz.de/ai4eo/WG_Uncertainty.

LGSep 2, 2023

Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction

Alejandro Rodriguez Dominguez, Muhammad Shahzad, Xia Hong

Multi-modal problems can be effectively addressed using multiple hypothesis frameworks, but integrating these frameworks into learning models poses significant challenges. This paper introduces a Structured Radial Basis Function Network (s-RBFN) as an ensemble of multiple hypothesis predictors for regression. During the training of the predictors, first the centroidal Voronoi tessellations are formed based on their losses and the true labels, representing geometrically the set of multiple hypotheses. Then, the trained predictors are used to compute a structured dataset with their predictions, including centers and scales for the basis functions. A radial basis function network, with each basis function focused on a particular hypothesis, is subsequently trained using this structured dataset for multiple hypotheses prediction. The s-RBFN is designed to train efficiently while controlling diversity in ensemble learning parametrically. The least-squares approach for training the structured ensemble model provides a closed-form solution for multiple hypotheses and structured predictions. During the formation of the structured dataset, a parameter is employed to avoid mode collapse by controlling tessellation shapes. This parameter provides a mechanism to balance diversity and generalization performance for the s-RBFN. The empirical validation on two multivariate prediction datasets-air quality and energy appliance predictions-demonstrates the superior generalization performance and computational efficiency of the structured ensemble model compared to other models and their single-hypothesis counterparts.

CVFeb 26, 2022

Person Re-identification: A Retrospective on Domain Specific Open Challenges and Future Trends

Asmat Zahra, Nazia Perwaiz, Muhammad Shahzad et al.

Person re-identification (Re-ID) is one of the primary components of an automated visual surveillance system. It aims to automatically identify/search persons in a multi-camera network having non-overlapping field-of-views. Owing to its potential in various applications and research significance, a plethora of deep learning based re-Id approaches have been proposed in the recent years. However, there exist several vision related challenges, e.g., occlusion, pose scale \& viewpoint variance, background clutter, person misalignment and cross-domain generalization across camera modalities, which makes the problem of re-Id still far from being solved. Majority of the proposed approaches directly or indirectly aim to solve one or multiple of these existing challenges. In this context, a comprehensive review of current re-ID approaches in solving theses challenges is needed to analyze and focus on particular aspects for further advancements. At present, such a focused review does not exist and henceforth in this paper, we have presented a systematic challenge-specific literature survey of 230+ papers between the years of 2015-21. For the first time a survey of this type have been presented where the person re-Id approaches are reviewed in such solution-oriented perspective. Moreover, we have presented several diversified prominent developing trends in the respective research domain which will provide a visionary perspective regarding ongoing person re-Id research and eventually help to develop practical real world solutions.

IVFeb 9, 2022

Semantic Segmentation of Anaemic RBCs Using Multilevel Deep Convolutional Encoder-Decoder Network

Muhammad Shahzad, Arif Iqbal Umar, Syed Hamad Shirazi et al.

Pixel-level analysis of blood images plays a pivotal role in diagnosing blood-related diseases, especially Anaemia. These analyses mainly rely on an accurate diagnosis of morphological deformities like shape, size, and precise pixel counting. In traditional segmentation approaches, instance or object-based approaches have been adopted that are not feasible for pixel-level analysis. The convolutional neural network (CNN) model required a large dataset with detailed pixel-level information for the semantic segmentation of red blood cells in the deep learning domain. In current research work, we address these problems by proposing a multi-level deep convolutional encoder-decoder network along with two state-of-the-art healthy and Anaemic-RBC datasets. The proposed multi-level CNN model preserved pixel-level semantic information extracted in one layer and then passed to the next layer to choose relevant features. This phenomenon helps to precise pixel-level counting of healthy and anaemic-RBC elements along with morphological analysis. For experimental purposes, we proposed two state-of-the-art RBC datasets, i.e., Healthy-RBCs and Anaemic-RBCs dataset. Each dataset contains 1000 images, ground truth masks, relevant, complete blood count (CBC), and morphology reports for performance evaluation. The proposed model results were evaluated using crossmatch analysis with ground truth mask by finding IoU, individual training, validation, testing accuracies, and global accuracies using a 05-fold training procedure. This model got training, validation, and testing accuracies as 0.9856, 0.9760, and 0.9720 on the Healthy-RBC dataset and 0.9736, 0.9696, and 0.9591 on an Anaemic-RBC dataset. The IoU and BFScore of the proposed model were 0.9311, 0.9138, and 0.9032, 0.8978 on healthy and anaemic datasets, respectively.

CVDec 4, 2021

Orientation Aware Weapons Detection In Visual Data : A Benchmark Dataset

Nazeef Ul Haq, Muhammad Moazam Fraz, Tufail Sajjad Shah Hashmi et al.

Automatic detection of weapons is significant for improving security and well being of individuals, nonetheless, it is a difficult task due to large variety of size, shape and appearance of weapons. View point variations and occlusion also are reasons which makes this task more difficult. Further, the current object detection algorithms process rectangular areas, however a slender and long rifle may really cover just a little portion of area and the rest may contain unessential details. To overcome these problem, we propose a CNN architecture for Orientation Aware Weapons Detection, which provides oriented bounding box with improved weapons detection performance. The proposed model provides orientation not only using angle as classification problem by dividing angle into eight classes but also angle as regression problem. For training our model for weapon detection a new dataset comprising of total 6400 weapons images is gathered from the web and then manually annotated with position oriented bounding boxes. Our dataset provides not only oriented bounding box as ground truth but also horizontal bounding box. We also provide our dataset in multiple formats of modern object detectors for further research in this area. The proposed model is evaluated on this dataset, and the comparative analysis with off-the shelf object detectors yields superior performance of proposed model, measured with standard evaluation strategies. The dataset and the model implementation are made publicly available at this link: https://bit.ly/2TyZICF.

CVJul 9, 2021

Segmentation of VHR EO Images using Unsupervised Learning

Sudipan Saha, Lichao Mou, Muhammad Shahzad et al.

Semantic segmentation is a crucial step in many Earth observation tasks. Large quantity of pixel-level annotation is required to train deep networks for semantic segmentation. Earth observation techniques are applied to varieties of applications and since classes vary widely depending on the applications, therefore, domain knowledge is often required to label Earth observation images, impeding availability of labeled training data in many Earth observation applications. To tackle these challenges, in this paper we propose an unsupervised semantic segmentation method that can be trained using just a single unlabeled scene. Remote sensing scenes are generally large. The proposed method exploits this property to sample smaller patches from the larger scene and uses deep clustering and contrastive learning to refine the weights of a lightweight deep model composed of a series of the convolution layers along with an embedded channel attention. After unsupervised training on the target image/scene, the model automatically segregates the major classes present in the scene and produces the segmentation map. Experimental results on the Vaihingen dataset demonstrate the efficacy of the proposed method.

LGJul 7, 2021

A Survey of Uncertainty in Deep Neural Networks

Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali et al.

Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. A comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and not reducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks, ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for the calibration of neural networks and give an overview of existing baselines and implementations. Different examples from the wide spectrum of challenges in different fields give an idea of the needs and challenges regarding uncertainties in practical applications. Additionally, the practical limitations of current methods for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given.

CVNov 29, 2020

Exploring Deep 3D Spatial Encodings for Large-Scale 3D Scene Understanding

Saqib Ali Khan, Yilei Shi, Muhammad Shahzad et al.

Semantic segmentation of raw 3D point clouds is an essential component in 3D scene analysis, but it poses several challenges, primarily due to the non-Euclidean nature of 3D point clouds. Although, several deep learning based approaches have been proposed to address this task, but almost all of them emphasized on using the latent (global) feature representations from traditional convolutional neural networks (CNN), resulting in severe loss of spatial information, thus failing to model the geometry of the underlying 3D objects, that plays an important role in remote sensing 3D scenes. In this letter, we have proposed an alternative approach to overcome the limitations of CNN based approaches by encoding the spatial features of raw 3D point clouds into undirected symmetrical graph models. These encodings are then combined with a high-dimensional feature vector extracted from a traditional CNN into a localized graph convolution operator that outputs the required 3D segmentation map. We have performed experiments on two standard benchmark datasets (including an outdoor aerial remote sensing dataset and an indoor synthetic dataset). The proposed method achieves on par state-of-the-art accuracy with improved training time and model stability thus indicating strong potential for further research towards a generalized state-of-the-art method for 3D scene understanding.

IVJan 28, 2020

Robust Method for Semantic Segmentation of Whole-Slide Blood Cell Microscopic Image

Muhammad Shahzad, Arif Iqbal Umar, Muazzam A. Khan et al.

Previous works on segmentation of SEM (scanning electron microscope) blood cell image ignore the semantic segmentation approach of whole-slide blood cell segmentation. In the proposed work, we address the problem of whole-slide blood cell segmentation using the semantic segmentation approach. We design a novel convolutional encoder-decoder framework along with VGG-16 as the pixel-level feature extraction model. -e proposed framework comprises 3 main steps: First, all the original images along with manually generated ground truth masks of each blood cell type are passed through the preprocessing stage. In the preprocessing stage, pixel-level labeling, RGB to grayscale conversion of masked image and pixel fusing, and unity mask generation are performed. After that, VGG16 is loaded into the system, which acts as a pretrained pixel-level feature extraction model. In the third step, the training process is initiated on the proposed model. We have evaluated our network performance on three evaluation metrics. We obtained outstanding results with respect to classwise, as well as global and mean accuracies. Our system achieved classwise accuracies of 97.45%, 93.34%, and 85.11% for RBCs, WBCs, and platelets, respectively, while global and mean accuracies remain 97.18% and 91.96%, respectively.

IVAug 14, 2018

Buildings Detection in VHR SAR Images Using Fully Convolution Neural Networks

Muhammad Shahzad, Michael Maurer, Friedrich Fraundorfer et al.

This paper addresses the highly challenging problem of automatically detecting man-made structures especially buildings in very high resolution (VHR) synthetic aperture radar (SAR) images. In this context, the paper has two major contributions: Firstly, it presents a novel and generic workflow that initially classifies the spaceborne TomoSAR point clouds $ - $ generated by processing VHR SAR image stacks using advanced interferometric techniques known as SAR tomography (TomoSAR) $ - $ into buildings and non-buildings with the aid of auxiliary information (i.e., either using openly available 2-D building footprints or adopting an optical image classification scheme) and later back project the extracted building points onto the SAR imaging coordinates to produce automatic large-scale benchmark labelled (buildings/non-buildings) SAR datasets. Secondly, these labelled datasets (i.e., building masks) have been utilized to construct and train the state-of-the-art deep Fully Convolution Neural Networks with an additional Conditional Random Field represented as a Recurrent Neural Network to detect building regions in a single VHR SAR image. Such a cascaded formation has been successfully employed in computer vision and remote sensing fields for optical image classification but, to our knowledge, has not been applied to SAR images. The results of the building detection are illustrated and validated over a TerraSAR-X VHR spotlight SAR image covering approximately 39 km$ ^2 $ $ - $ almost the whole city of Berlin $ - $ with mean pixel accuracies of around 93.84%