Murtaza Taj

CV
h-index5
22papers
88citations
Novelty48%
AI Score44

22 Papers

CVNov 22, 2022Code
Multi-task Learning for Camera Calibration

Talha Hanif Butt, Murtaza Taj

For a number of tasks, such as 3D reconstruction, robotic interface, autonomous driving, etc., camera calibration is essential. In this study, we present a unique method for predicting intrinsic (principal point offset and focal length) and extrinsic (baseline, pitch, and translation) properties from a pair of images. We suggested a novel method where camera model equations are represented as a neural network in a multi-task learning framework, in contrast to existing methods, which build a comprehensive solution. By reconstructing the 3D points using a camera model neural network and then using the loss in reconstruction to obtain the camera specifications, this innovative camera projection loss (CPL) method allows us that the desired parameters should be estimated. As far as we are aware, our approach is the first one that uses an approach to multi-task learning that includes mathematical formulas in a framework for learning to estimate camera parameters to predict both the extrinsic and intrinsic parameters jointly. Additionally, we provided a new dataset named as CVGL Camera Calibration Dataset [1] which has been collected using the CARLA Simulator [2]. Actually, we show that our suggested strategy out performs both conventional methods and methods based on deep learning on 6 out of 10 parameters that were assessed using both real and synthetic data. Our code and generated dataset are available at https://github.com/thanif/Camera-Calibration-through-Camera-Projection-Loss.

CVMar 25, 2023
Spatio-Temporal driven Attention Graph Neural Network with Block Adjacency matrix (STAG-NN-BA) for Remote Land-use Change Detection

Usman Nazir, Wadood Islam, Sara Khalid et al.

Land-use monitoring is fundamental for spatial planning, particularly in view of compound impacts of growing global populations and climate change. Despite existing applications of deep learning in land use monitoring, standard convolutional kernels in deep neural networks limit the applications of these networks to the Euclidean domain only. Considering the geodesic nature of the measurement of the earth's surface, remote sensing is one such area that can benefit from non-Euclidean and spherical domains. For this purpose, we designed a novel Graph Neural Network architecture for spatial and spatio-temporal classification using satellite imagery to acquire insights into socio-economic indicators. We propose a hybrid attention method to learn the relative importance of irregular neighbors in remote sensing data. Instead of classifying each pixel, we propose a method based on Simple Linear Iterative Clustering (SLIC) image segmentation and Graph Attention Network. The superpixels obtained from SLIC become the nodes of our Graph Convolution Network (GCN). A region adjacency graph (RAG) is then constructed where each superpixel is connected to every other adjacent superpixel in the image, enabling information to propagate globally. Finally, we propose a Spatially driven Attention Graph Neural Network (SAG-NN) to classify each RAG. We also propose an extension to our SAG-NN for spatio-temporal data. Unlike regular grids of pixels in images, superpixels are irregular in nature and cannot be used to create spatio-temporal graphs. We introduce temporal bias by combining unconnected RAGs from each image into one supergraph. This is achieved by introducing block adjacency matrices resulting in novel Spatio-Temporal driven Attention Graph Neural Network with Block Adjacency matrix (STAG-NN-BA). SAG-NN and STAG-NN-BA outperform graph and non-graph baselines on Asia14 and C2D2 datasets efficiently.

LGOct 4, 2023
Mending of Spatio-Temporal Dependencies in Block Adjacency Matrix

Osama Ahmad, Omer Abdul Jalil, Usman Nazir et al.

In the realm of applications where data dynamically evolves across spatial and temporal dimensions, Graph Neural Networks (GNNs) are often complemented by sequence modeling architectures, such as RNNs and transformers, to effectively model temporal changes. These hybrid models typically arrange the spatial and temporal learning components in series. A pioneering effort to jointly model the spatio-temporal dependencies using only GNNs was the introduction of the Block Adjacency Matrix \(\mathbf{A_B}\) \cite{1}, which was constructed by diagonally concatenating adjacency matrices from graphs at different time steps. This approach resulted in a single graph encompassing complete spatio-temporal data; however, the graphs from different time steps remained disconnected, limiting GNN message-passing to spatially connected nodes only. Addressing this critical challenge, we propose a novel end-to-end learning architecture specifically designed to mend the temporal dependencies, resulting in a well-connected graph. Thus, we provide a framework for the learnable representation of spatio-temporal data as graphs. Our methodology demonstrates superior performance on benchmark datasets, such as SurgVisDom and C2D2, surpassing existing state-of-the-art graph models in terms of accuracy. Our model also achieves significantly lower computational complexity, having far fewer parameters than methods reliant on CLIP and 3D CNN architectures.

CVMar 21, 2023
Mitigating climate and health impact of small-scale kiln industry using multi-spectral classifier and deep learning

Usman Nazir, Murtaza Taj, Momin Uppal et al.

Industrial air pollution has a direct health impact and is a major contributor to climate change. Small scale industries particularly bull-trench brick kilns are one of the key sources of air pollution in South Asia often creating hazardous levels of smog that is injurious to human health. To mitigate the climate and health impact of the kiln industry, fine-grained kiln localization at different geographic locations is needed. Kiln localization using multi-spectral remote sensing data such as vegetation indices can result in a noisy estimates whereas relying solely on high-resolution imagery is infeasible due to cost and compute complexities. This paper proposes a fusion of spatio-temporal multi-spectral data with high-resolution imagery for detection of brick kilns within the "Brick-Kiln-Belt" of South Asia. We first perform classification using low-resolution spatio-temporal multi-spectral data from Sentinel-2 imagery by combining vegetation, burn, build up and moisture indices. Next, orientation aware object detector YOLOv3 (with theta value) is implemented for removal of false detections and fine-grained localization. Our proposed technique, when compared with other benchmarks, results in a 21 times improvement in speed with comparable or higher accuracy when tested over multiple countries.

CVFeb 13, 2024Code
Camera Calibration through Geometric Constraints from Rotation and Projection Matrices

Muhammad Waleed, Abdul Rauf, Murtaza Taj

The process of camera calibration involves estimating the intrinsic and extrinsic parameters, which are essential for accurately performing tasks such as 3D reconstruction, object tracking and augmented reality. In this work, we propose a novel constraints-based loss for measuring the intrinsic (focal length: $(f_x, f_y)$ and principal point: $(p_x, p_y)$) and extrinsic (baseline: ($b$), disparity: ($d$), translation: $(t_x, t_y, t_z)$, and rotation specifically pitch: $(θ_p)$) camera parameters. Our novel constraints are based on geometric properties inherent in the camera model, including the anatomy of the projection matrix (vanishing points, image of world origin, axis planes) and the orthonormality of the rotation matrix. Thus we proposed a novel Unsupervised Geometric Constraint Loss (UGCL) via a multitask learning framework. Our methodology is a hybrid approach that employs the learning power of a neural network to estimate the desired parameters along with the underlying mathematical properties inherent in the camera projection matrix. This distinctive approach not only enhances the interpretability of the model but also facilitates a more informed learning process. Additionally, we introduce a new CVGL Camera Calibration dataset, featuring over 900 configurations of camera parameters, incorporating 63,600 image pairs that closely mirror real-world conditions. By training and testing on both synthetic and real-world datasets, our proposed approach demonstrates improvements across all parameters when compared to the state-of-the-art (SOTA) benchmarks. The code and the updated dataset can be found here: https://github.com/CVLABLUMS/CVGL-Camera-Calibration

CVOct 7, 2021Code
Camera Calibration through Camera Projection Loss

Talha Hanif Butt, Murtaza Taj

Camera calibration is a necessity in various tasks including 3D reconstruction, hand-eye coordination for a robotic interaction, autonomous driving, etc. In this work we propose a novel method to predict extrinsic (baseline, pitch, and translation), intrinsic (focal length and principal point offset) parameters using an image pair. Unlike existing methods, instead of designing an end-to-end solution, we proposed a new representation that incorporates camera model equations as a neural network in multi-task learning framework. We estimate the desired parameters via novel camera projection loss (CPL) that uses the camera model neural network to reconstruct the 3D points and uses the reconstruction loss to estimate the camera parameters. To the best of our knowledge, ours is the first method to jointly estimate both the intrinsic and extrinsic parameters via a multi-task learning methodology that combines analytical equations in learning framework for the estimation of camera parameters. We also proposed a novel dataset using CARLA Simulator. Empirically, we demonstrate that our proposed approach achieves better performance with respect to both deep learning-based and traditional methods on 8 out of 10 parameters evaluated using both synthetic and real data. Our code and generated dataset are available at https://github.com/thanif/Camera-Calibration-through-Camera-Projection-Loss.

CVDec 8, 2025
Liver Fibrosis Quantification and Analysis: The LiQA Dataset and Baseline Method

Yuanye Liu, Hanxiao Zhang, Nannan Shi et al.

Liver fibrosis represents a significant global health burden, necessitating accurate staging for effective clinical management. This report introduces the LiQA (Liver Fibrosis Quantification and Analysis) dataset, established as part of the CARE 2024 challenge. Comprising $440$ patients with multi-phase, multi-center MRI scans, the dataset is curated to benchmark algorithms for Liver Segmentation (LiSeg) and Liver Fibrosis Staging (LiFS) under complex real-world conditions, including domain shifts, missing modalities, and spatial misalignment. We further describe the challenge's top-performing methodology, which integrates a semi-supervised learning framework with external data for robust segmentation, and utilizes a multi-view consensus approach with Class Activation Map (CAM)-based regularization for staging. Evaluation of this baseline demonstrates that leveraging multi-source data and anatomical constraints significantly enhances model robustness in clinical settings.

CVNov 25, 2025
Exploring State-of-the-art models for Early Detection of Forest Fires

Sharjeel Ahmed, Daim Armaghan, Fatima Naweed et al.

There have been many recent developments in the use of Deep Learning Neural Networks for fire detection. In this paper, we explore an early warning system for detection of forest fires. Due to the lack of sizeable datasets and models tuned for this task, existing methods suffer from missed detection. In this work, we first propose a dataset for early identification of forest fires through visual analysis. Unlike existing image corpuses that contain images of wide-spread fire, our dataset consists of multiple instances of smoke plumes and fire that indicates the initiation of fire. We obtained this dataset synthetically by utilising game simulators such as Red Dead Redemption 2. We also combined our dataset with already published images to obtain a more comprehensive set. Finally, we compared image classification and localisation methods on the proposed dataset. More specifically we used YOLOv7 (You Only Look Once) and different models of detection transformer.

CVJul 15, 2025
CATVis: Context-Aware Thought Visualization

Tariq Mehmood, Hamza Ahmad, Muhammad Haroon Shakeel et al.

EEG-based brain-computer interfaces (BCIs) have shown promise in various applications, such as motor imagery and cognitive state monitoring. However, decoding visual representations from EEG signals remains a significant challenge due to their complex and noisy nature. We thus propose a novel 5-stage framework for decoding visual representations from EEG signals: (1) an EEG encoder for concept classification, (2) cross-modal alignment of EEG and text embeddings in CLIP feature space, (3) caption refinement via re-ranking, (4) weighted interpolation of concept and caption embeddings for richer semantics, and (5) image generation using a pre-trained Stable Diffusion model. We enable context-aware EEG-to-image generation through cross-modal alignment and re-ranking. Experimental results demonstrate that our method generates high-quality images aligned with visual stimuli, outperforming SOTA approaches by 13.43% in Classification Accuracy, 15.21% in Generation Accuracy and reducing Fréchet Inception Distance by 36.61%, indicating superior semantic alignment and image quality.

CVNov 14, 2024
Building Height Estimation Using Shadow Length in Satellite Imagery

Mahd Qureshi, Shayaan Chaudhry, Sana Jabba et al.

Estimating building height from satellite imagery poses significant challenges, especially when monocular images are employed, resulting in a loss of essential 3D information during imaging. This loss of spatial depth further complicates the height estimation process. We addressed this issue by using shadow length as an additional cue to compensate for the loss of building height estimation using single-view imagery. We proposed a novel method that first localized a building and its shadow in the given satellite image. After localization, the shadow length is estimated using a regression model. To estimate the final height of each building, we utilize the principles of photogrammetry, specifically considering the relationship between the solar elevation angle, the vertical edge length of the building, and the length of the building's shadow. For the localization of buildings in our model, we utilized a modified YOLOv7 detector, and to regress the shadow length for each building we utilized the ResNet18 as backbone architecture. Finally, we estimated the associated building height using solar elevation with shadow length through analytical formulation. We evaluated our method on 42 different cities and the results showed that the proposed framework surpasses the state-of-the-art methods with a suitable margin.

CVOct 16, 2021
Neural Network Pruning Through Constrained Reinforcement Learning

Shehryar Malik, Muhammad Umair Haider, Omer Iqbal et al.

Network pruning reduces the size of neural networks by removing (pruning) neurons such that the performance drop is minimal. Traditional pruning approaches focus on designing metrics to quantify the usefulness of a neuron which is often quite tedious and sub-optimal. More recent approaches have instead focused on training auxiliary networks to automatically learn how useful each neuron is however, they often do not take computational limitations into account. In this work, we propose a general methodology for pruning neural networks. Our proposed methodology can prune neural networks to respect pre-defined computational budgets on arbitrary, possibly non-differentiable, functions. Furthermore, we only assume the ability to be able to evaluate these functions for different inputs, and hence they do not need to be fully specified beforehand. We achieve this by proposing a novel pruning strategy via constrained reinforcement learning algorithms. We prove the effectiveness of our approach via comparison with state-of-the-art methods on standard image classification datasets. Specifically, we reduce 83-92.90 of total parameters on various variants of VGG while achieving comparable or better performance than that of original networks. We also achieved 75.09 reduction in parameters on ResNet18 without incurring any loss in accuracy.

LGJun 11, 2021
Survey of Image Based Graph Neural Networks

Usman Nazir, He Wang, Murtaza Taj

In this survey paper, we analyze image based graph neural networks and propose a three-step classification approach. We first convert the image into superpixels using the Quickshift algorithm so as to reduce 30% of the input data. The superpixels are subsequently used to generate a region adjacency graph. Finally, the graph is passed through a state-of-art graph convolutional neural network to get classification scores. We also analyze the spatial and spectral convolution filtering techniques in graph neural networks. Spectral-based models perform better than spatial-based models and classical CNN with lesser compute cost.

CVMar 18, 2021
Spatio-temporal Crop Classification On Volumetric Data

Muhammad Usman Qadeer, Salar Saeed, Murtaza Taj et al.

Large-area crop classification using multi-spectral imagery is a widely studied problem for several decades and is generally addressed using classical Random Forest classifier. Recently, deep convolutional neural networks (DCNN) have been proposed. However, these methods only achieved results comparable with Random Forest. In this work, we present a novel CNN based architecture for large-area crop classification. Our methodology combines both spatio-temporal analysis via 3D CNN as well as temporal analysis via 1D CNN. We evaluated the efficacy of our approach on Yolo and Imperial county benchmark datasets. Our combined strategy outperforms both classical as well as recent DCNN based methods in terms of classification accuracy by 2% while maintaining a minimum number of parameters and the lowest inference time.

CVOct 6, 2020
Comprehensive Online Network Pruning via Learnable Scaling Factors

Muhammad Umair Haider, Murtaza Taj

One of the major challenges in deploying deep neural network architectures is their size which has an adverse effect on their inference time and memory requirements. Deep CNNs can either be pruned width-wise by removing filters based on their importance or depth-wise by removing layers and blocks. Width wise pruning (filter pruning) is commonly performed via learnable gates or switches and sparsity regularizers whereas pruning of layers has so far been performed arbitrarily by manually designing a smaller network usually referred to as a student network. We propose a comprehensive pruning strategy that can perform both width-wise as well as depth-wise pruning. This is achieved by introducing gates at different granularities (neuron, filter, layer, block) which are then controlled via an objective function that simultaneously performs pruning at different granularity during each forward pass. Our approach is applicable to wide-variety of architectures without any constraints on spatial dimensions or connection type (sequential, residual, parallel or inception). Our method has resulted in a compression ratio of 70% to 90% without noticeable loss in accuracy when evaluated on benchmark datasets.

CVJul 9, 2020
Attention Neural Network for Trash Detection on Water Channels

Mohbat Tharani, Abdul Wahab Amin, Mohammad Maaz et al.

Rivers and canals flowing through cities are often used illegally for dumping the trash. This contaminates freshwater channels as well as causes blockage in sewerage resulting in urban flooding. When this contaminated water reaches agricultural fields, it results in degradation of soil and poses critical environmental as well as economic threats. The dumped trash is often found floating on the water surface. The trash could be disfigured, partially submerged, decomposed into smaller pieces, clumped together with other objects which obscure its shape and creates a challenging detection problem. This paper proposes a method for the detection of visible trash floating on the water surface of the canals in urban areas. We also provide a large dataset, first of its kind, trash in water channels that contains object-level annotations. A novel attention layer is proposed that improves the detection of smaller objects. Towards the end of this paper, we provide a detailed comparison of our method with state-of-the-art object detectors and show that our method significantly improves the detection of smaller objects. The dataset will be made publicly available.

CVMay 2, 2020
Cross-View Image Retrieval -- Ground to Aerial Image Retrieval through Deep Learning

Numan Khurshid, Talha Hanif, Mohbat Tharani et al.

Cross-modal retrieval aims to measure the content similarity between different types of data. The idea has been previously applied to visual, text, and speech data. In this paper, we present a novel cross-modal retrieval method specifically for multi-view images, called Cross-view Image Retrieval CVIR. Our approach aims to find a feature space as well as an embedding space in which samples from street-view images are compared directly to satellite-view images (and vice-versa). For this comparison, a novel deep metric learning based solution "DeepCVIR" has been proposed. Previous cross-view image datasets are deficient in that they (1) lack class information; (2) were originally collected for cross-view image geolocalization task with coupled images; (3) do not include any images from off-street locations. To train, compare, and evaluate the performance of cross-view image retrieval, we present a new 6 class cross-view image dataset termed as CrossViewRet which comprises of images including freeway, mountain, palace, river, ship, and stadium with 700 high-resolution dual-view images for each class. Results show that the proposed DeepCVIR outperforms conventional matching approaches on the CVIR task for the given dataset and would also serve as the baseline for future research.

CVApr 9, 2020
Early Disease Diagnosis for Rice Crop

M. Hammad Masood, Habiba Saim, Murtaza Taj et al.

Many existing techniques provide automatic estimation of crop damage due to various diseases. However, early detection can prevent or reduce the extend of damage itself. The limited performance of existing techniques in early detection is lack of localized information. We instead propose a dataset with annotations for each diseased segment in each image. Unlike existing approaches, instead of classifying images into either healthy or diseased, we propose to provide localized classification for each segment of an images. Our method is based on Mask RCNN and provides location as well as extend of infected regions on the plant. Thus the extend of damage on the crop can be estimated. Our method has obtained overall 87.6% accuracy on the proposed dataset as compared to 58.4% obtained without incorporating localized information.

LGApr 7, 2020
Teacher-Class Network: A Neural Network Compression Mechanism

Shaiq Munir Malik, Muhammad Umair Haider, Mohbat Tharani et al.

To reduce the overwhelming size of Deep Neural Networks (DNN) teacher-student methodology tries to transfer knowledge from a complex teacher network to a simple student network. We instead propose a novel method called the teacher-class network consisting of a single teacher and multiple student networks (i.e. class of students). Instead of transferring knowledge to one student only, the proposed method transfers a chunk of knowledge to each student. Our students are not trained for problem-specific logits, they are trained to mimic knowledge (dense representation) learned by the teacher network thus the combined knowledge learned by the class of students can be used to solve other problems as well. The proposed teacher-class architecture is evaluated on several benchmark datasets such as MNIST, Fashion MNIST, IMDB Movie Reviews, CAMVid, CIFAR-10 and ImageNet on multiple tasks including image classification, sentiment classification and segmentation. Our approach outperforms the state of-the-art single student approach in terms of accuracy as well as computational cost while achieving 10-30 times reduction in parameters.

IVDec 22, 2019
Patch-based Generative Adversarial Network Towards Retinal Vessel Segmentation

Waseem Abbas, Muhammad Haroon Shakeel, Numan Khurshid et al.

Retinal blood vessels are considered to be the reliable diagnostic biomarkers of ophthalmologic and diabetic retinopathy. Monitoring and diagnosis totally depends on expert analysis of both thin and thick retinal vessels which has recently been carried out by various artificial intelligent techniques. Existing deep learning methods attempt to segment retinal vessels using a unified loss function optimized for both thin and thick vessels with equal importance. Due to variable thickness, biased distribution, and difference in spatial features of thin and thick vessels, unified loss function are more influential towards identification of thick vessels resulting in weak segmentation. To address this problem, a conditional patch-based generative adversarial network is proposed which utilizes a generator network and a patch-based discriminator network conditioned on the sample data with an additional loss function to learn both thin and thick vessels. Experiments are conducted on publicly available STARE and DRIVE datasets which show that the proposed model outperforms the state-of-the-art methods.

CVJul 12, 2019
Tiny-Inception-ResNet-v2: Using Deep Learning for Eliminating Bonded Labors of Brick Kilns in South Asia

Usman Nazir, Numan Khurshid, Muhammad Ahmed Bhimra et al.

This paper proposes to employ a Inception-ResNet inspired deep learning architecture called Tiny-Inception-ResNet-v2 to eliminate bonded labor by identifying brick kilns within "Brick-Kiln-Belt" of South Asia. The framework is developed by training a network on the satellite imagery consisting of 11 different classes of South Asian region. The dataset developed during the process includes the geo-referenced images of brick kilns, houses, roads, tennis courts, farms, sparse trees, dense trees, orchards, parking lots, parks and barren lands. The dataset is made publicly available for further research. Our proposed network architecture with very fewer learning parameters outperforms all state-of-the-art architectures employed for recognition of brick kilns. Our proposed solution would enable regional monitoring and evaluation mechanisms for the Sustainable Development Goals.

CVOct 15, 2018
Unsupervised Deep Features for Remote Sensing Image Matching via Discriminator Network

Mohbat Tharani, Numan Khurshid, Murtaza Taj

The advent of deep perceptual networks brought about a paradigm shift in machine vision and image perception. Image apprehension lately carried out by hand-crafted features in the latent space have been replaced by deep features acquired from supervised networks for improved understanding. However, such deep networks require strict supervision with a substantial amount of the labeled data for authentic training process. These methods perform poorly in domains lacking labeled data especially in case of remote sensing image retrieval. Resolving this, we propose an unsupervised encoder-decoder feature for remote sensing image matching (RSIM). Moreover, we replace the conventional distance metrics with a deep discriminator network to identify the similarity of the image pairs. To the best of our knowledge, discriminator network has never been used before for solving RSIM problem. Results have been validated with two publicly available benchmark remote sensing image datasets. The technique has also been investigated for content-based remote sensing image retrieval (CBRSIR); one of the widely used applications of RSIM. Results demonstrate that our technique supersedes the state-of-the-art methods used for unsupervised image matching with mean average precision (mAP) of 81%, and image retrieval with an overall improvement in mAP score of about 12%.

CVJun 22, 2018
Point cloud segmentation using hierarchical tree for architectural models

Omair Hassaan, Abeera Shamail, Zain Butt et al.

Recent developments in the 3D scanning technologies have made the generation of highly accurate 3D point clouds relatively easy but the segmentation of these point clouds remains a challenging area. A number of techniques have set precedent of either planar or primitive based segmentation in literature. In this work, we present a novel and an effective primitive based point cloud segmentation algorithm. The primary focus, i.e. the main technical contribution of our method is a hierarchical tree which iteratively divides the point cloud into segments. This tree uses an exclusive energy function and a 3D convolutional neural network, HollowNets to classify the segments. We test the efficacy of our proposed approach using both real and synthetic data obtaining an accuracy greater than 90% for domes and minarets.