CVOct 4, 2023
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence RecognitionHamid Mohammadi, Ehsan Nazerfard, Tahereh Firoozi
Video violence recognition based on deep learning concerns accurate yet scalable human violence recognition. Currently, most state-of-the-art video violence recognition studies use CNN-based models to represent and categorize videos. However, recent studies suggest that pre-trained transformers are more accurate than CNN-based models on various video analysis benchmarks. Yet these models are not thoroughly evaluated for video violence recognition. This paper introduces a novel transformer-based Mixture of Experts (MoE) video violence recognition system. Through an intelligent combination of large vision transformers and efficient transformer architectures, the proposed system not only takes advantage of the vision transformer architecture but also reduces the cost of utilizing large vision transformers. The proposed architecture maximizes violence recognition system accuracy while actively reducing computational costs through a reinforcement learning-based router. The empirical results show the proposed MoE architecture's superiority over CNN-based models by achieving 92.4% accuracy on the RWF dataset.
CVOct 12, 2023
Visual Self-supervised Learning Scheme for Dense Prediction Tasks on X-ray ImagesShervin Halat, Mohammad Rahmati, Ehsan Nazerfard
Recently, significant advancements in artificial intelligence have been attributed to the integration of self-supervised learning (SSL) scheme. While SSL has shown impressive achievements in natural language processing (NLP), its progress in computer vision has comparatively lagged behind. However, the incorporation of contrastive learning into existing visual SSL models has led to considerable progress, often surpassing supervised counterparts. Nonetheless, these improvements have been mostly limited to classification tasks. Moreover, few studies have evaluated visual SSL models in real-world scenarios, as most have focused on datasets with class-wise portrait images, notably ImageNet. Here, we focus on dense prediction tasks using security inspection x-ray images to evaluate our proposed model, Segment Localization (SegLoc). Based upon the Instance Localization (InsLoc) model, SegLoc addresses one of the key challenges of contrastive learning, i.e., false negative pairs of query embeddings. Our pre-training dataset is synthesized by cutting, transforming, and pasting labeled segments from an existing labeled dataset (PIDray) as foregrounds onto instances from an unlabeled dataset (SIXray) as backgrounds. Furthermore, we fully leverage the labeled data by incorporating the concept, one queue per class, into the MoCo-v2 memory bank, thereby avoiding false negative pairs. In our experiments, SegLoc outperformed random initialization by 3% to 6% while underperformed supervised initialization, in terms of AR and AP metrics across different IoU values over 20 to 30 pre-training epochs.
LGMar 24, 2025
Anchor-based oversampling for imbalanced tabular data via contrastive and adversarial learningHadi Mohammadi, Ehsan Nazerfard, Mostafa Haghir Chehreghani
Imbalanced data represent a distribution with more frequencies of one class (majority) than the other (minority). This phenomenon occurs across various domains, such as security, medical care and human activity. In imbalanced learning, classification algorithms are typically inclined to classify the majority class accurately, resulting in artificially high accuracy rates. As a result, many minority samples are mistakenly labelled as majority-class instances, resulting in a bias that benefits the majority class. This study presents a framework based on boundary anchor samples to tackle the imbalance learning challenge. First, we select and use anchor samples to train a multilayer perceptron (MLP) classifier, which acts as a prior knowledge model and aids the adversarial and contrastive learning procedures. Then, we designed a novel deep generative model called Anchor Stabilized Conditional Generative Adversarial Network or Anch-SCGAN in short. Anch-SCGAN is supported with two generators for the minority and majority classes and a discriminator incorporating additional class-specific information from the pre-trained feature extractor MLP. In addition, we facilitate the generator's training procedure in two ways. First, we define a new generator loss function based on reprocessed anchor samples and contrastive learning. Second, we apply a scoring strategy to stabilize the adversarial training part in generators. We train Anch-SCGAN and further finetune it with anchor samples to improve the precision of the generated samples. Our experiments on 16 real-world imbalanced datasets illustrate that Anch-SCGAN outperforms the renowned methods in imbalanced learning.
CVFeb 4, 2022
Video Violence Recognition and Localization Using a Semi-Supervised Hard Attention ModelHamid Mohammadi, Ehsan Nazerfard
The significant growth of surveillance camera networks necessitates scalable AI solutions to efficiently analyze the large amount of video data produced by these networks. As a typical analysis performed on surveillance footage, video violence detection has recently received considerable attention. The majority of research has focused on improving existing methods using supervised methods, with little, if any, attention to the semi-supervised learning approaches. In this study, a reinforcement learning model is introduced that can outperform existing models through a semi-supervised approach. The main novelty of the proposed method lies in the introduction of a semi-supervised hard attention mechanism. Using hard attention, the essential regions of videos are identified and separated from the non-informative parts of the data. A model's accuracy is improved by removing redundant data and focusing on useful visual information in a higher resolution. Implementing hard attention mechanisms using semi-supervised reinforcement learning algorithms eliminates the need for attention annotations in video violence datasets, thus making them readily applicable. The proposed model utilizes a pre-trained I3D backbone to accelerate and stabilize the training process. The proposed model achieved state-of-the-art accuracy of 90.4% and 98.7% on RWF and Hockey datasets, respectively.
LGMar 29, 2019
Cross-Subject Transfer Learning in Human Activity Recognition Systems using Generative Adversarial NetworksElnaz Soleimani, Ehsan Nazerfard
Application of intelligent systems especially in smart homes and health-related topics has been drawing more attention in the last decades. Training Human Activity Recognition (HAR) models -- as a major module -- requires a fair amount of labeled data. Despite training with large datasets, most of the existing models will face a dramatic performance drop when they are tested against unseen data from new users. Moreover, recording enough data for each new user is unviable due to the limitations and challenges of working with human users. Transfer learning techniques aim to transfer the knowledge which has been learned from the source domain (subject) to the target domain in order to decrease the models' performance loss in the target domain. This paper presents a novel method of adversarial knowledge transfer named SA-GAN stands for Subject Adaptor GAN which utilizes Generative Adversarial Network framework to perform cross-subject transfer learning in the domain of wearable sensor-based Human Activity Recognition. SA-GAN outperformed other state-of-the-art methods in more than 66% of experiments and showed the second best performance in the remaining 25% of experiments. In some cases, it reached up to 90% of the accuracy which can be obtained by supervised training over the same domain data.
LGMar 12, 2019
Online Human Activity Recognition Employing Hierarchical Hidden Markov ModelsParviz Asghari, Elnaz Soelimani, Ehsan Nazerfard
In the last few years there has been a growing interest in Human Activity Recognition~(HAR) topic. Sensor-based HAR approaches, in particular, has been gaining more popularity owing to their privacy preserving nature. Furthermore, due to the widespread accessibility of the internet, a broad range of streaming-based applications such as online HAR, has emerged over the past decades. However, proposing sufficiently robust online activity recognition approach in smart environment setting is still considered as a remarkable challenge. This paper presents a novel online application of Hierarchical Hidden Markov Model in order to detect the current activity on the live streaming of sensor events. Our method consists of two phases. In the first phase, data stream is segmented based on the beginning and ending of the activity patterns. Also, on-going activity is reported with every receiving observation. This phase is implemented using Hierarchical Hidden Markov models. The second phase is devoted to the correction of the provided label for the segmented data stream based on statistical features. The proposed model can also discover the activities that happen during another activity - so-called interrupted activities. After detecting the activity pane, the predicted label will be corrected utilizing statistical features such as time of day at which the activity happened and the duration of the activity. We validated our proposed method by testing it against two different smart home datasets and demonstrated its effectiveness, which is competing with the state-of-the-art methods.
LGOct 4, 2018
Activity Recognition using Hierarchical Hidden Markov Models on Streaming Sensor DataParviz Asghari, Ehsan Nazerfard
Activity recognition from sensor data deals with various challenges, such as overlapping activities, activity labeling, and activity detection. Although each challenge in the field of recognition has great importance, the most important one refers to online activity recognition. The present study tries to use online hierarchical hidden Markov model to detect an activity on the stream of sensor data which can predict the activity in the environment with any sensor event. The activity recognition samples were labeled by the statistical features such as the duration of activity. The results of our proposed method test on two different datasets of smart homes in the real world showed that one dataset has improved 4% and reached (59%) while the results reached 64.6% for the other data by using the best methods.