Arslan Munir

CV
h-index30
19papers
635citations
Novelty45%
AI Score45

19 Papers

CVAug 16, 2022
ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos

James Wensel, Hayat Ullah, Arslan Munir

Human activity recognition is an emerging and important area in computer vision which seeks to determine the activity an individual or group of individuals are performing. The applications of this field ranges from generating highlight videos in sports, to intelligent surveillance and gesture recognition. Most activity recognition systems rely on a combination of convolutional neural networks (CNNs) to perform feature extraction from the data and recurrent neural networks (RNNs) to determine the time dependent nature of the data. This paper proposes and designs two transformer neural networks for human activity recognition: a recurrent transformer (ReT), a specialized neural network used to make predictions on sequences of data, as well as a vision transformer (ViT), a transformer optimized for extracting salient features from images, to improve speed and scalability of activity recognition. We have provided an extensive comparison of the proposed transformer neural networks with the contemporary CNN and RNN-based human activity recognition models in terms of speed and accuracy.

CVAug 9, 2022
Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework

Hayat Ullah, Arslan Munir

Vision-based human activity recognition has emerged as one of the essential research areas in video analytics domain. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the human activity recognition task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency and robustness, resulting in a biased tradeoff in their proposals to deal with challenging human activity recognition problem. To overcome the limitations of contemporary deep learning models for human activity recognition, this paper presents a computationally efficient yet generic spatial-temporal cascaded framework that exploits the deep discriminative spatial and temporal features for human activity recognition. For efficient representation of human actions, we have proposed an efficient dual attentional convolutional neural network (CNN) architecture that leverages a unified channel-spatial attention mechanism to extract human-centric salient features in video frames. The dual channel-spatial attention layers together with the convolutional layers learn to be more attentive in the spatial receptive fields having objects over the number of feature maps. The extracted discriminative salient features are then forwarded to stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted, where the obtained results show that the proposed framework attains an improvement in execution time up to 167 times in terms of frames per second as compared to most of the contemporary action recognition methods.

AINov 14, 2018Code
TrolleyMod v1.0: An Open-Source Simulation and Data-Collection Platform for Ethical Decision Making in Autonomous Vehicles

Vahid Behzadan, James Minton, Arslan Munir

This paper presents TrolleyMod v1.0, an open-source platform based on the CARLA simulator for the collection of ethical decision-making data for autonomous vehicles. This platform is designed to facilitate experiments aiming to observe and record human decisions and actions in high-fidelity simulations of ethical dilemmas that occur in the context of driving. Targeting experiments in the class of trolley problems, TrolleyMod provides a seamless approach to creating new experimental settings and environments with the realistic physics-engine and the high-quality graphical capabilities of CARLA and the Unreal Engine. Also, TrolleyMod provides a straightforward interface between the CARLA environment and Python to enable the implementation of custom controllers, such as deep reinforcement learning agents. The results of such experiments can be used for sociological analyses, as well as the training and tuning of value-aligned autonomous vehicles based on social values that are inferred from observations.

CVJul 28, 2025
Improving Adversarial Robustness Through Adaptive Learning-Driven Multi-Teacher Knowledge Distillation

Hayat Ullah, Syed Muhammad Talha Zaidi, Arslan Munir

Convolutional neural networks (CNNs) excel in computer vision but are susceptible to adversarial attacks, crafted perturbations designed to mislead predictions. Despite advances in adversarial training, a gap persists between model accuracy and robustness. To mitigate this issue, in this paper, we present a multi-teacher adversarial robustness distillation using an adaptive learning strategy. Specifically, our proposed method first trained multiple clones of a baseline CNN model using an adversarial training strategy on a pool of perturbed data acquired through different adversarial attacks. Once trained, these adversarially trained models are used as teacher models to supervise the learning of a student model on clean data using multi-teacher knowledge distillation. To ensure an effective robustness distillation, we design an adaptive learning strategy that controls the knowledge contribution of each model by assigning weights as per their prediction precision. Distilling knowledge from adversarially pre-trained teacher models not only enhances the learning capabilities of the student model but also empowers it with the capacity to withstand different adversarial attacks, despite having no exposure to adversarial data. To verify our claims, we extensively evaluated our proposed method on MNIST-Digits and Fashion-MNIST datasets across diverse experimental settings. The obtained results exhibit the efficacy of our multi-teacher adversarial distillation and adaptive learning strategy, enhancing CNNs' adversarial robustness against various adversarial attacks.

CVJul 16, 2025
DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition

Hayat Ullah, Muhammad Ali Shafique, Abbas Khan et al.

The landscape of video recognition has evolved significantly, shifting from traditional Convolutional Neural Networks (CNNs) to Transformer-based architectures for improved accuracy. While 3D CNNs have been effective at capturing spatiotemporal dynamics, recent Transformer models leverage self-attention to model long-range spatial and temporal dependencies. Despite achieving state-of-the-art performance on major benchmarks, Transformers remain computationally expensive, particularly with dense video data. To address this, we propose a lightweight Video Focal Modulation Network, DVFL-Net, which distills spatiotemporal knowledge from a large pre-trained teacher into a compact nano student model, enabling efficient on-device deployment. DVFL-Net utilizes knowledge distillation and spatial-temporal feature modulation to significantly reduce computation while preserving high recognition performance. We employ forward Kullback-Leibler (KL) divergence alongside spatio-temporal focal modulation to effectively transfer both local and global context from the Video-FocalNet Base (teacher) to the proposed VFL-Net (student). We evaluate DVFL-Net on UCF50, UCF101, HMDB51, SSV2, and Kinetics-400, benchmarking it against recent state-of-the-art methods in Human Action Recognition (HAR). Additionally, we conduct a detailed ablation study analyzing the impact of forward KL divergence. The results confirm the superiority of DVFL-Net in achieving an optimal balance between performance and efficiency, demonstrating lower memory usage, reduced GFLOPs, and strong accuracy, making it a practical solution for real-time HAR applications.

CVJul 16, 2025
OD-VIRAT: A Large-Scale Benchmark for Object Detection in Realistic Surveillance Environments

Hayat Ullah, Abbas Khan, Arslan Munir et al.

Realistic human surveillance datasets are crucial for training and evaluating computer vision models under real-world conditions, facilitating the development of robust algorithms for human and human-interacting object detection in complex environments. These datasets need to offer diverse and challenging data to enable a comprehensive assessment of model performance and the creation of more reliable surveillance systems for public safety. To this end, we present two visual object detection benchmarks named OD-VIRAT Large and OD-VIRAT Tiny, aiming at advancing visual understanding tasks in surveillance imagery. The video sequences in both benchmarks cover 10 different scenes of human surveillance recorded from significant height and distance. The proposed benchmarks offer rich annotations of bounding boxes and categories, where OD-VIRAT Large has 8.7 million annotated instances in 599,996 images and OD-VIRAT Tiny has 288,901 annotated instances in 19,860 images. This work also focuses on benchmarking state-of-the-art object detection architectures, including RETMDET, YOLOX, RetinaNet, DETR, and Deformable-DETR on this object detection-specific variant of VIRAT dataset. To the best of our knowledge, it is the first work to examine the performance of these recently published state-of-the-art object detection architectures on realistic surveillance imagery under challenging conditions such as complex backgrounds, occluded objects, and small-scale objects. The proposed benchmarking and experimental settings will help in providing insights concerning the performance of selected object detection models and set the base for developing more efficient and robust object detection architectures.

CVJul 8, 2025
Hierarchical Multi-Stage Transformer Architecture for Context-Aware Temporal Action Localization

Hayat Ullah, Arslan Munir, Oliver Nina

Inspired by the recent success of transformers and multi-stage architectures in video recognition and object detection domains. We thoroughly explore the rich spatio-temporal properties of transformers within a multi-stage architecture paradigm for the temporal action localization (TAL) task. This exploration led to the development of a hierarchical multi-stage transformer architecture called PCL-Former, where each subtask is handled by a dedicated transformer module with a specialized loss function. Specifically, the Proposal-Former identifies candidate segments in an untrimmed video that may contain actions, the Classification-Former classifies the action categories within those segments, and the Localization-Former precisely predicts the temporal boundaries (i.e., start and end) of the action instances. To evaluate the performance of our method, we have conducted extensive experiments on three challenging benchmark datasets: THUMOS-14, ActivityNet-1.3, and HACS Segments. We also conducted detailed ablation experiments to assess the impact of each individual module of our PCL-Former. The obtained quantitative results validate the effectiveness of the proposed PCL-Former, outperforming state-of-the-art TAL approaches by 2.8%, 1.2%, and 4.8% on THUMOS14, ActivityNet-1.3, and HACS datasets, respectively.

ARNov 9, 2021
Phantom: A High-Performance Computational Core for Sparse Convolutional Neural Networks

Mahmood Azhar Qureshi, Arslan Munir

Sparse convolutional neural networks (CNNs) have gained significant traction over the past few years as sparse CNNs can drastically decrease the model size and computations, if exploited befittingly, as compared to their dense counterparts. Sparse CNNs often introduce variations in the layer shapes and sizes, which can prevent dense accelerators from performing well on sparse CNN models. Recently proposed sparse accelerators like SCNN, Eyeriss v2, and SparTen, actively exploit the two-sided or full sparsity, that is, sparsity in both weights and activations, for performance gains. These accelerators, however, either have inefficient micro-architecture, which limits their performance, have no support for non-unit stride convolutions and fully-connected (FC) layers, or suffer massively from systematic load imbalance. To circumvent these issues and support both sparse and dense models, we propose Phantom, a multi-threaded, dynamic, and flexible neural computational core. Phantom uses sparse binary mask representation to actively lookahead into sparse computations, and dynamically schedule its computational threads to maximize the thread utilization and throughput. We also generate a two-dimensional (2D) mesh architecture of Phantom neural computational cores, which we refer to as Phantom-2D accelerator, and propose a novel dataflow that supports all layers of a CNN, including unit and non-unit stride convolutions, and FC layers. In addition, Phantom-2D uses a two-level load balancing strategy to minimize the computational idling, thereby, further improving the hardware utilization. To show support for different types of layers, we evaluate the performance of the Phantom architecture on VGG16 and MobileNet. Our simulations show that the Phantom-2D accelerator attains a performance gain of 12x, 4.1x, 1.98x, and 2.36x, over dense architectures, SCNN, SparTen, and Eyeriss v2, respectively.

CRJul 19, 2020
PUF-RLA: A PUF-based Reliable and Lightweight Authentication Protocol employing Binary String Shuffling

Mahmood Azhar Qureshi, Arslan Munir

Physically unclonable functions (PUFs) can be employed for device identification, authentication, secret key storage, and other security tasks. However, PUFs are susceptible to modeling attacks if a number of PUFs' challenge-response pairs (CRPs) are exposed to the adversary. Furthermore, many of the embedded devices requiring authentication have stringent resource constraints and thus require a lightweight authentication mechanism. We propose PUF-RLA, a PUF-based lightweight, highly reliable authentication scheme employing binary string shuffling. The proposed scheme enhances the reliability of PUF as well as alleviates the resource constraints by employing error correction in the server instead of the device without compromising the security. The proposed PUF-RLA is robust against brute force, replay, and modeling attacks. In PUF-RLA, we introduce an inexpensive yet secure stream authentication scheme inside the device which authenticates the server before the underlying PUF can be invoked. This prevents an adversary from brute forcing the device's PUF to acquire CRPs essentially locking out the device from unauthorized model generation. Additionally, we also introduce a lightweight CRP obfuscation mechanism involving XOR and shuffle operations. Results and security analysis verify that the PUF-RLA is secure against brute force, replay, and modeling attacks, and provides ~99% reliable authentication. In addition, PUF-RLA provides a reduction of 63% and 74% for look-up tables (LUTs) and register count, respectively, in FPGA compared to a recently proposed approach while providing additional authentication advantages.

ARJul 19, 2020
NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for Convolutional Neural Networks

Mahmood Azhar Qureshi, Arslan Munir

Convolutional neural networks (CNNs) require high throughput hardware accelerators for real time applications owing to their huge computational cost. Most traditional CNN accelerators rely on single core, linear processing elements (PEs) in conjunction with 1D dataflows for accelerating convolution operations. This limits the maximum achievable ratio of peak throughput per PE count to unity. Most of the past works optimize their dataflows to attain close to a 100% hardware utilization to reach this ratio. In this paper, we introduce a high throughput, multi-threaded, log-based PE core. The designed core provides a 200% increase in peak throughput per PE count while only incurring a 6% increase in area overhead compared to a single, linear multiplier PE core with same output bit precision. We also present a 2D weight broadcast dataflow which exploits the multi-threaded nature of the PE cores to achieve a high hardware utilization per layer for various CNNs. The entire architecture, which we refer to as NeuroMAX, is implemented on Xilinx Zynq 7020 SoC at 200 MHz processing clock. Detailed analysis is performed on throughput, hardware utilization, area and power breakdown, and latency to show performance improvement compared to previous FPGA and ASIC designs.

LGNov 14, 2018
Emergence of Addictive Behaviors in Reinforcement Learning Agents

Vahid Behzadan, Roman V. Yampolskiy, Arslan Munir

This paper presents a novel approach to the technical analysis of wireheading in intelligent agents. Inspired by the natural analogues of wireheading and their prevalent manifestations, we propose the modeling of such phenomenon in Reinforcement Learning (RL) agents as psychological disorders. In a preliminary step towards evaluating this proposal, we study the feasibility and dynamics of emergent addictive policies in Q-learning agents in the tractable environment of the game of Snake. We consider a slightly modified settings for this game, in which the environment provides a "drug" seed alongside the original "healthy" seed for the consumption of the snake. We adopt and extend an RL-based model of natural addiction to Q-learning agents in this settings, and derive sufficient parametric conditions for the emergence of addictive behaviors in such agents. Furthermore, we evaluate our theoretical analysis with three sets of simulation-based experiments. The results demonstrate the feasibility of addictive wireheading in RL agents, and provide promising venues of further research on the psychopathological modeling of complex AI safety problems.

LGOct 23, 2018
The Faults in Our Pi Stars: Security Issues and Open Challenges in Deep Reinforcement Learning

Vahid Behzadan, Arslan Munir

Since the inception of Deep Reinforcement Learning (DRL) algorithms, there has been a growing interest in both research and industrial communities in the promising potentials of this paradigm. The list of current and envisioned applications of deep RL ranges from autonomous navigation and robotics to control applications in the critical infrastructure, air traffic control, defense technologies, and cybersecurity. While the landscape of opportunities and the advantages of deep RL algorithms are justifiably vast, the security risks and issues in such algorithms remain largely unexplored. To facilitate and motivate further research on these critical challenges, this paper presents a foundational treatment of the security problem in DRL. We formulate the security requirements of DRL, and provide a high-level threat model through the classification and identification of vulnerabilities, attack vectors, and adversarial capabilities. Furthermore, we present a review of current literature on security of deep RL from both offensive and defensive perspectives. Lastly, we enumerate critical research venues and open problems in mitigation and prevention of intentional attacks against deep RL as a roadmap for further research in this area.

LGJun 4, 2018
Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise

Vahid Behzadan, Arslan Munir

Recent developments have established the vulnerability of deep reinforcement learning to policy manipulation attacks via intentionally perturbed inputs, known as adversarial examples. In this work, we propose a technique for mitigation of such attacks based on addition of noise to the parameter space of deep reinforcement learners during training. We experimentally verify the effect of parameter-space noise in reducing the transferability of adversarial examples, and demonstrate the promising performance of this technique in mitigating the impact of whitebox and blackbox attacks at both test and training times.

ROJun 4, 2018
Adversarial Reinforcement Learning Framework for Benchmarking Collision Avoidance Mechanisms in Autonomous Vehicles

Vahid Behzadan, Arslan Munir

With the rapidly growing interest in autonomous navigation, the body of research on motion planning and collision avoidance techniques has enjoyed an accelerating rate of novel proposals and developments. However, the complexity of new techniques and their safety requirements render the bulk of current benchmarking frameworks inadequate, thus leaving the need for efficient comparison techniques unanswered. This work proposes a novel framework based on deep reinforcement learning for benchmarking the behavior of collision avoidance mechanisms under the worst-case scenario of dealing with an optimal adversarial agent, trained to drive the system into unsafe states. We describe the architecture and flow of this framework as a benchmarking solution, and demonstrate its efficacy via a practical case study of comparing the reliability of two collision avoidance mechanisms in response to intentional collision attempts.

AIMay 23, 2018
A Psychopathological Approach to Safety Engineering in AI and AGI

Vahid Behzadan, Arslan Munir, Roman V. Yampolskiy

The complexity of dynamics in AI techniques is already approaching that of complex adaptive systems, thus curtailing the feasibility of formal controllability and reachability analysis in the context of AI safety. It follows that the envisioned instances of Artificial General Intelligence (AGI) will also suffer from challenges of complexity. To tackle such issues, we propose the modeling of deleterious behaviors in AI and AGI as psychological disorders, thereby enabling the employment of psychopathological approaches to analysis and control of misbehaviors. Accordingly, we present a discussion on the feasibility of the psychopathological approaches to AI safety, and propose general directions for research on modeling, diagnosis, and treatment of psychological disorders in AGI.

AIDec 23, 2017
Whatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger

Vahid Behzadan, Arslan Munir

Recent developments have established the vulnerability of deep Reinforcement Learning (RL) to policy manipulation attacks via adversarial perturbations. In this paper, we investigate the robustness and resilience of deep RL to training-time and test-time attacks. Through experimental results, we demonstrate that under noncontiguous training-time attacks, Deep Q-Network (DQN) agents can recover and adapt to the adversarial conditions by reactively adjusting the policy. Our results also show that policies learned under adversarial perturbations are more robust to test-time attacks. Furthermore, we compare the performance of $ε$-greedy and parameter-space noise exploration methods in terms of robustness and resilience against adversarial perturbations.

DBSep 18, 2017
A Comparative Quantitative Analysis of Contemporary Big Data Clustering Algorithms for Market Segmentation in Hospitality Industry

Avishek Bose, Arslan Munir, Neda Shabani

The hospitality industry is one of the data-rich industries that receives huge Volumes of data streaming at high Velocity with considerably Variety, Veracity, and Variability. These properties make the data analysis in the hospitality industry a big data problem. Meeting the customers' expectations is a key factor in the hospitality industry to grasp the customers' loyalty. To achieve this goal, marketing professionals in this industry actively look for ways to utilize their data in the best possible manner and advance their data analytic solutions, such as identifying a unique market segmentation clustering and developing a recommendation system. In this paper, we present a comprehensive literature review of existing big data clustering algorithms and their advantages and disadvantages for various use cases. We implement the existing big data clustering algorithms and provide a quantitative comparison of the performance of different clustering algorithms for different scenarios. We also present our insights and recommendations regarding the suitability of different big data clustering algorithms for different use cases. These recommendations will be helpful for hoteliers in selecting the appropriate market segmentation clustering algorithm for different clustering datasets to improve the customer experience and maximize the hotel revenue.

SYSep 13, 2017
Models and Framework for Adversarial Attacks on Complex Adaptive Systems

Vahid Behzadan, Arslan Munir

We introduce the paradigm of adversarial attacks that target the dynamics of Complex Adaptive Systems (CAS). To facilitate the analysis of such attacks, we present multiple approaches to the modeling of CAS as dynamical, data-driven, and game-theoretic systems, and develop quantitative definitions of attack, vulnerability, and resilience in the context of CAS security. Furthermore, we propose a comprehensive set of schemes for classification of attacks and attack surfaces in CAS, complemented with examples of practical attacks. Building on this foundation, we propose a framework based on reinforcement learning for simulation and analysis of attacks on CAS, and demonstrate its performance through three real-world case studies of targeting power grids, destabilization of terrorist organizations, and manipulation of machine learning agents. We also discuss potential mitigation techniques, and remark on future research directions in analysis and design of secure complex adaptive systems.

LGJan 16, 2017
Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks

Vahid Behzadan, Arslan Munir

Deep learning classifiers are known to be inherently vulnerable to manipulation by intentionally perturbed inputs, named adversarial examples. In this work, we establish that reinforcement learning techniques based on Deep Q-Networks (DQNs) are also vulnerable to adversarial input perturbations, and verify the transferability of adversarial examples across different DQN models. Furthermore, we present a novel class of attacks based on this vulnerability that enable policy manipulation and induction in the learning process of DQNs. We propose an attack mechanism that exploits the transferability of adversarial examples to implement policy induction attacks on DQNs, and demonstrate its efficacy and impact through experimental study of a game-learning scenario.