Debi Prosad Dogra

h-index32

18papers

355citations

Novelty40%

AI Score40

Ranked #71,624 of 194,257 authors (top 37%)#24,342 in CV (top 41%)

18 Papers

16.3CVNov 2, 2022

DyAnNet: A Scene Dynamicity Guided Self-Trained Video Anomaly Detection Network

Kamalakar Thakare, Yash Raghuwanshi, Debi Prosad Dogra et al.

Unsupervised approaches for video anomaly detection may not perform as good as supervised approaches. However, learning unknown types of anomalies using an unsupervised approach is more practical than a supervised approach as annotation is an extra burden. In this paper, we use isolation tree-based unsupervised clustering to partition the deep feature space of the video segments. The RGB- stream generates a pseudo anomaly score and the flow stream generates a pseudo dynamicity score of a video segment. These scores are then fused using a majority voting scheme to generate preliminary bags of positive and negative segments. However, these bags may not be accurate as the scores are generated only using the current segment which does not represent the global behavior of a typical anomalous event. We then use a refinement strategy based on a cross-branch feed-forward network designed using a popular I3D network to refine both scores. The bags are then refined through a segment re-mapping strategy. The intuition of adding the dynamicity score of a segment with the anomaly score is to enhance the quality of the evidence. The method has been evaluated on three popular video anomaly datasets, i.e., UCF-Crime, CCTV-Fights, and UBI-Fights. Experimental results reveal that the proposed framework achieves competitive accuracy as compared to the state-of-the-art video anomaly detection methods.

2.0LGJul 29, 2023

Feature Reweighting for EEG-based Motor Imagery Classification

Taveena Lotey, Prateek Keserwani, Debi Prosad Dogra et al.

Classification of motor imagery (MI) using non-invasive electroencephalographic (EEG) signals is a critical objective as it is used to predict the intention of limb movements of a subject. In recent research, convolutional neural network (CNN) based methods have been widely utilized for MI-EEG classification. The challenges of training neural networks for MI-EEG signals classification include low signal-to-noise ratio, non-stationarity, non-linearity, and high complexity of EEG signals. The features computed by CNN-based networks on the highly noisy MI-EEG signals contain irrelevant information. Subsequently, the feature maps of the CNN-based network computed from the noisy and irrelevant features contain irrelevant information. Thus, many non-contributing features often mislead the neural network training and degrade the classification performance. Hence, a novel feature reweighting approach is proposed to address this issue. The proposed method gives a noise reduction mechanism named feature reweighting module that suppresses irrelevant temporal and channel feature maps. The feature reweighting module of the proposed method generates scores that reweight the feature maps to reduce the impact of irrelevant information. Experimental results show that the proposed method significantly improved the classification of MI-EEG signals of Physionet EEG-MMIDB and BCI Competition IV 2a datasets by a margin of 9.34% and 3.82%, respectively, compared to the state-of-the-art methods.

6.7AINov 19, 2023

A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and Applications

Sudhanshu Kumar, Partha Pratim Roy, Debi Prosad Dogra et al.

Sentiment analysis (SA) is an emerging field in text mining. It is the process of computationally identifying and categorizing opinions expressed in a piece of text over different social media platforms. Social media plays an essential role in knowing the customer mindset towards a product, services, and the latest market trends. Most organizations depend on the customer's response and feedback to upgrade their offered products and services. SA or opinion mining seems to be a promising research area for various domains. It plays a vital role in analyzing big data generated daily in structured and unstructured formats over the internet. This survey paper defines sentiment and its recent research and development in different domains, including voice, images, videos, and text. The challenges and opportunities of sentiment analysis are also discussed in the paper. \keywords{Sentiment Analysis, Machine Learning, Lexicon-based approach, Deep Learning, Natural Language Processing}

4.7CVMay 25

An Analysis Focused on Womens Safety: Can VAD Models Be Enhanced by a Multi-modal Dataset?

Sangeeta, Maddikuntla Sai Prajwal, Debi Prosad Dogra et al.

Women's safety and security are paramount for a modern society. Crimes against women occur in daylight as well as in low-light conditions. Often, such events are captured through real-world surveillance cameras that operate at lower resolutions. Despite substantial progress in CV-related research, video anomaly detection (VAD) focused on women's safety has not yet been adequately addressed. Existing video anomaly datasets contain well-lit, high-resolution, close-shot videos, and fail to represent women-centric anomalies such as chain snatching, stalking, inappropriate touch, and other subtle forms of crime against women. To address these problems, we propose the ExtrAnom dataset, a new multi-modal benchmark containing 1001 videos with textual descriptions, 500 normal and 501 anomalous, classified into 5 different types of women-centric crimes. The dataset comprises low-light (8%), low-resolution videos (13%), long-shot (15%), along with daylight (64%) anomalous videos. And it covers anomalous events like stalking (3.9%), chain snatching (17.6%), kidnapping (7.3%), assassinations (2.3%), harassment (18.9%), and normal (50%). Each video is supplemented with 4 textual annotations, including one human-generated and three LLM-generated descriptions, enabling cross-modal and VLM-based validations. The aim of creating a women-centric dataset is to accurately detect the women-centric anomaly patterns, which are possible to observe visually. The dataset supplements the VLMs to accurately generate video-level descriptions. ExtrAnom has been benchmarked against popular unimodal and multi-modal VAD datasets (e.g., XD-Violence, UCF-Crime, and UCA) and SOTA methods. Experiments reveal that the existing datasets are insufficient to train models for detecting women-centric anomalies.

6.6SPFeb 8, 2022

Efficacy of Transformer Networks for Classification of Raw EEG Data

Gourav Siddhad, Anmol Gupta, Debi Prosad Dogra et al.

With the unprecedented success of transformer networks in natural language processing (NLP), recently, they have been successfully adapted to areas like computer vision, generative adversarial networks (GAN), and reinforcement learning. Classifying electroencephalogram (EEG) data has been challenging and researchers have been overly dependent on pre-processing and hand-crafted feature extraction. Despite having achieved automated feature extraction in several other domains, deep learning has not yet been accomplished for EEG. In this paper, the efficacy of the transformer network for the classification of raw EEG data (cleaned and pre-processed) is explored. The performance of transformer networks was evaluated on a local (age and gender data) and a public dataset (STEW). First, a classifier using a transformer network is built to classify the age and gender of a person with raw resting-state EEG data. Second, the classifier is tuned for mental workload classification with open access raw multi-tasking mental workload EEG data (STEW). The network achieves an accuracy comparable to state-of-the-art accuracy on both the local (Age and Gender dataset; 94.53% (gender) and 87.79% (age)) and the public (STEW dataset; 95.28% (two workload levels) and 88.72% (three workload levels)) dataset. The accuracy values have been achieved using raw EEG data without feature extraction. Results indicate that the transformer-based deep learning models can successfully abate the need for heavy feature-extraction of EEG data for successful classification.

2.3CVMar 12, 2020

Understanding Crowd Flow Movements Using Active-Langevin Model

Shreetam Behera, Debi Prosad Dogra, Malay Kumar Bandyopadhyay et al.

Crowd flow describes the elementary group behavior of crowds. Understanding the dynamics behind these movements can help to identify various abnormalities in crowds. However, developing a crowd model describing these flows is a challenging task. In this paper, a physics-based model is proposed to describe the movements in dense crowds. The crowd model is based on active Langevin equation where the motion points are assumed to be similar to active colloidal particles in fluids. The model is further augmented with computer-vision techniques to segment both linear and non-linear motion flows in a dense crowd. The evaluation of the active Langevin equation-based crowd segmentation has been done on publicly available crowd videos and on our own videos. The proposed method is able to segment the flow with lesser optical flow error and better accuracy in comparison to existing state-of-the-art methods.

1.8CVApr 15, 2019

Estimation of Linear Motion in Dense Crowd Videos using Langevin Model

Shreetam Behera, Debi Prosad Dogra, Malay Kumar Bandyopadhyay et al.

Crowd gatherings at social and cultural events are increasing in leaps and bounds with the increase in population. Surveillance through computer vision and expert decision making systems can help to understand the crowd phenomena at large gatherings. Understanding crowd phenomena can be helpful in early identification of unwanted incidents and their prevention. Motion flow is one of the important crowd phenomena that can be instrumental in describing the crowd behavior. Flows can be useful in understanding instabilities in the crowd. However, extracting motion flows is a challenging task due to randomness in crowd movement and limitations of the sensing device. Moreover, low-level features such as optical flow can be misleading if the randomness is high. In this paper, we propose a new model based on Langevin equation to analyze the linear dominant flows in videos of densely crowded scenarios. We assume a force model with three components, namely external force, confinement/drift force, and disturbance force. These forces are found to be sufficient to describe the linear or near-linear motion in dense crowd videos. The method is significantly faster as compared to existing popular crowd segmentation methods. The evaluation of the proposed model has been carried out on publicly available datasets as well as using our dataset. It has been observed that the proposed method is able to estimate and segment the linear flows in the dense crowd with better accuracy as compared to state-of-the-art techniques with substantial decrease in the computational overhead.

3.4CVFeb 13, 2019

Can We Automate Diagrammatic Reasoning?

Sk. Arif Ahmed, Debi Prosad Dogra, Samarjit Kar et al.

Learning to solve diagrammatic reasoning (DR) can be a challenging but interesting problem to the computer vision research community. It is believed that next generation pattern recognition applications should be able to simulate human brain to understand and analyze reasoning of images. However, due to the lack of benchmarks of diagrammatic reasoning, the present research primarily focuses on visual reasoning that can be applied to real-world objects. In this paper, we present a diagrammatic reasoning dataset that provides a large variety of DR problems. In addition, we also propose a Knowledge-based Long Short Term Memory (KLSTM) to solve diagrammatic reasoning problems. Our proposed analysis is arguably the first work in this research area. Several state-of-the-art learning frameworks have been used to compare with the proposed KLSTM framework in the present context. Preliminary results indicate that the domain is highly related to computer vision and pattern recognition research with several challenging avenues.

2.6CVFeb 13, 2019

Person Re-identification in Videos by Analyzing Spatio-Temporal Tubes

Sk. Arif Ahmed, Debi Prosad Dogra, Heeseung Choi et al.

Typical person re-identification frameworks search for k best matches in a gallery of images that are often collected in varying conditions. The gallery may contain image sequences when re-identification is done on videos. However, such a process is time consuming as re-identification has to be carried out multiple times. In this paper, we extract spatio-temporal sequences of frames (referred to as tubes) of moving persons and apply a multi-stage processing to match a given query tube with a gallery of stored tubes recorded through other cameras. Initially, we apply a binary classifier to remove noisy images from the input query tube. In the next step, we use a key-pose detection-based query minimization. This reduces the length of the query tube by removing redundant frames. Finally, a 3-stage hierarchical re-identification framework is used to rank the output tubes as per the matching scores. Experiments with publicly available video re-identification datasets reveal that our framework is better than state-of-the-art methods. It ranks the tubes with an increased CMC accuracy of 6-8% across multiple datasets. Also, our method significantly reduces the number of false positives. A new video re-identification dataset, named Tube-based Reidentification Video Dataset (TRiViD), has been prepared with an aim to help the re-identification research community

1.7CVDec 18, 2018

Video Trajectory Classification and Anomaly Detection Using Hybrid CNN-VAE

Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy et al.

Classifying time series data using neural networks is a challenging problem when the length of the data varies. Video object trajectories, which are key to many of the visual surveillance applications, are often found to be of varying length. If such trajectories are used to understand the behavior (normal or anomalous) of moving objects, they need to be represented correctly. In this paper, we propose video object trajectory classification and anomaly detection using a hybrid Convolutional Neural Network (CNN) and Variational Autoencoder (VAE) architecture. First, we introduce a high level representation of object trajectories using color gradient form. In the next stage, a semi-supervised way to annotate moving object trajectories extracted using Temporal Unknown Incremental Clustering (TUIC), has been applied for trajectory class labeling. Anomalous trajectories are separated using t-Distributed Stochastic Neighbor Embedding (t-SNE). Finally, a hybrid CNN-VAE architecture has been used for trajectory classification and anomaly detection. The results obtained using publicly available surveillance video datasets reveal that the proposed method can successfully identify some of the important traffic anomalies such as vehicles not following lane driving, sudden speed variations, abrupt termination of vehicle movement, and vehicles moving in wrong directions. The proposed method is able to detect above anomalies at higher accuracy as compared to existing anomaly detection methods.

5.4HCOct 22, 2018

Visual Rendering of Shapes on 2D Display Devices Guided by Hand Gestures

Abhik Singla, Partha Pratim Roy, Debi Prosad Dogra

Designing of touchless user interface is gaining popularity in various contexts. Using such interfaces, users can interact with electronic devices even when the hands are dirty or non-conductive. Also, user with partial physical disability can interact with electronic devices using such systems. Research in this direction has got major boost because of the emergence of low-cost sensors such as Leap Motion, Kinect or RealSense devices. In this paper, we propose a Leap Motion controller-based methodology to facilitate rendering of 2D and 3D shapes on display devices. The proposed method tracks finger movements while users perform natural gestures within the field of view of the sensor. In the next phase, trajectories are analyzed to extract extended Npen++ features in 3D. These features represent finger movements during the gestures and they are fed to unidirectional left-to-right Hidden Markov Model (HMM) for training. A one-to-one mapping between gestures and shapes is proposed. Finally, shapes corresponding to these gestures are rendered over the display using MuPad interface. We have created a dataset of 5400 samples recorded by 10 volunteers. Our dataset contains 18 geometric and 18 non-geometric shapes such as "circle", "rectangle", "flower", "cone", "sphere" etc. The proposed methodology achieves an accuracy of 92.87% when evaluated using 5-fold cross validation method. Our experiments revel that the extended 3D features perform better than existing 3D features in the context of shape representation and classification. The method can be used for developing useful HCI applications for smart display devices.

5.8CVSep 9, 2018

Fingertip Detection and Tracking for Recognition of Air-Writing in Videos

Sohom Mukherjee, Arif Ahmed, Debi Prosad Dogra et al.

Air-writing is the process of writing characters or words in free space using finger or hand movements without the aid of any hand-held device. In this work, we address the problem of mid-air finger writing using web-cam video as input. In spite of recent advances in object detection and tracking, accurate and robust detection and tracking of the fingertip remains a challenging task, primarily due to small dimension of the fingertip. Moreover, the initialization and termination of mid-air finger writing is also challenging due to the absence of any standard delimiting criterion. To solve these problems, we propose a new writing hand pose detection algorithm for initialization of air-writing using the Faster R-CNN framework for accurate hand detection followed by hand segmentation and finally counting the number of raised fingers based on geometrical properties of the hand. Further, we propose a robust fingertip detection and tracking approach using a new signature function called distance-weighted curvature entropy. Finally, a fingertip velocity-based termination criterion is used as a delimiter to mark the completion of the air-writing gesture. Experiments show the superiority of the proposed fingertip detection and tracking algorithm over state-of-the-art approaches giving a mean precision of 73.1 % while achieving real-time performance at 18.5 fps, a condition which is of vital importance to air-writing. Character recognition experiments give a mean accuracy of 96.11 % using the proposed air-writing system, a result which is comparable to that of existing handwritten character recognition systems.

3.0HCSep 2, 2018

PlutoAR: An Inexpensive, Interactive And Portable Augmented Reality Based Interpreter For K-10 Curriculum

Shourya Pratap Singh, Ankit Kumar Panda, Susobhit Panigrahi et al.

The regular K-10 curriculums often do not get the necessary of affordable technology involving interactive ways of teaching the prescribed curriculum with effective analytical skill building. In this paper, we present "PlutoAR", a paper-based augmented reality interpreter which is scalable, affordable, portable and can be used as a platform for skill building for the kids. PlutoAR manages to overcome the conventional albeit non-interactive ways of teaching by incorporating augmented reality (AR) through an interactive toolkit to provide students with the best of both worlds. Students cut out paper "tiles" and place these tiles one by one on a larger paper surface called "Launchpad" and use the PlutoAR mobile application which runs on any Android device with a camera and uses augmented reality to output each step of the program like an interpreter. PlutoAR has inbuilt AR experiences like stories, maze solving using conditional loops, simple elementary mathematics and the intuition of gravity.

2.5CVApr 18, 2018

Temporal Unknown Incremental Clustering (TUIC) Model for Analysis of Traffic Surveillance Videos

Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy

Optimized scene representation is an important characteristic of a framework for detecting abnormalities on live videos. One of the challenges for detecting abnormalities in live videos is real-time detection of objects in a non-parametric way. Another challenge is to efficiently represent the state of objects temporally across frames. In this paper, a Gibbs sampling based heuristic model referred to as Temporal Unknown Incremental Clustering (TUIC) has been proposed to cluster pixels with motion. Pixel motion is first detected using optical flow and a Bayesian algorithm has been applied to associate pixels belonging to similar cluster in subsequent frames. The algorithm is fast and produces accurate results in $Θ(kn)$ time, where $k$ is the number of clusters and $n$ the number of pixels. Our experimental validation with publicly available datasets reveals that the proposed framework has good potential to open-up new opportunities for real-time traffic analysis.

3.9CVMar 18, 2018

Trajectory-based Scene Understanding using Dirichlet Process Mixture Model

Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy et al.

Appropriate modeling of a surveillance scene is essential for detection of anomalies in road traffic. Learning usual paths can provide valuable insight into road traffic conditions and thus can help in identifying unusual routes taken by commuters/vehicles. If usual traffic paths are learned in a nonparametric way, manual interventions in road marking road can be avoided. In this paper, we propose an unsupervised and nonparametric method to learn frequently used paths from the tracks of moving objects in $Θ(kn)$ time, where $k$ denotes the number of paths and $n$ represents the number of tracks. In the proposed method, temporal dependencies of the moving objects are considered to make the clustering meaningful using Temporally Incremental Gravity Model (TIGM). In addition, the distance-based scene learning makes it intuitive to estimate the model parameters. Further, we have extended TIGM hierarchically as Dynamically Evolving Model (DEM) to represent notable traffic dynamics of a scene. Experimental validation reveals that the proposed method can learn a scene quickly without prior knowledge about the number of paths ($k$). We have compared the results with various state-of-the-art methods. We have also highlighted the advantages of the proposed method over existing techniques popularly used for designing traffic monitoring applications. It can be used for administrative decision making to control traffic at junctions or crowded places and generate alarm signals, if necessary.

1.7CVMar 17, 2018

Queuing Theory Guided Intelligent Traffic Scheduling through Video Analysis using Dirichlet Process Mixture Model

Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy

Accurate prediction of traffic signal duration for roadway junction is a challenging problem due to the dynamic nature of traffic flows. Though supervised learning can be used, parameters may vary across roadway junctions. In this paper, we present a computer vision guided expert system that can learn the departure rate of a given traffic junction modeled using traditional queuing theory. First, we temporally group the optical flow of the moving vehicles using Dirichlet Process Mixture Model (DPMM). These groups are referred to as tracklets or temporal clusters. Tracklet features are then used to learn the dynamic behavior of a traffic junction, especially during on/off cycles of a signal. The proposed queuing theory based approach can predict the signal open duration for the next cycle with higher accuracy when compared with other popular features used for tracking. The hypothesis has been verified on two publicly available video datasets. The results reveal that the DPMM based features are better than existing tracking frameworks to estimate $μ$. Thus, signal duration prediction is more accurate when tested on these datasets.The method can be used for designing intelligent operator-independent traffic control systems for roadway junctions at cities and highways.

0.9CVDec 5, 2017

Recognizing Gender from Human Facial Regions using Genetic Algorithm

Avirup Bhattacharyya, Rajkumar Saini, Partha Pratim Roy et al.

Recently, recognition of gender from facial images has gained a lot of importance. There exist a handful of research work that focus on feature extraction to obtain gender specific information from facial images. However, analyzing different facial regions and their fusion help in deciding the gender of a person from facial images. In this paper, we propose a new approach to identify gender from frontal facial images that is robust to background, illumination, intensity, and facial expression. In our framework, first the frontal face image is divided into a number of distinct regions based on facial landmark points that are obtained by the Chehra model proposed by Asthana et al. The model provides 49 facial landmark points covering different regions of the face, e.g. forehead, left eye, right eye, lips. Next, a face image is segmented into facial regions using landmark points and features are extracted from each region. The Compass LBP feature, a variant of LBP feature, has been used in our framework to obtain discriminative gender-specific information. Following this, a Support Vector Machine based classifier has been used to compute the probability scores from each facial region. Finally, the classification scores obtained from individual regions are combined with a genetic algorithm based learning to improve the overall classification accuracy. The experiments have been performed on popular face image datasets such as Adience, cFERET (color FERET), LFW and two sketch datasets, namely CUFS and CUFSF. Through experiments, we have observed that, the proposed method outperforms existing approaches.

0.9CVSep 15, 2017

Video Synopsis Generation Using Spatio-Temporal Groups

A. Ahmed, D. P. Dogra, S. Kar et al.

Millions of surveillance cameras operate at 24x7 generating huge amount of visual data for processing. However, retrieval of important activities from such a large data can be time consuming. Thus, researchers are working on finding solutions to present hours of visual data in a compressed, but meaningful way. Video synopsis is one of the ways to represent activities using relatively shorter duration clips. So far, two main approaches have been used by researchers to address this problem, namely synopsis by tracking moving objects and synopsis by clustering moving objects. Synopses outputs, mainly depend on tracking, segmenting, and shifting of moving objects temporally as well as spatially. In many situations, tracking fails, thus produces multiple trajectories of the same object. Due to this, the object may appear and disappear multiple times within the same synopsis output, which is misleading. This also leads to discontinuity and often can be confusing to the viewer of the synopsis. In this paper, we present a new approach for generating compressed video synopsis by grouping tracklets of moving objects. Grouping helps to generate a synopsis where chronologically related objects appear together with meaningful spatio-temporal relation. Our proposed method produces continuous, but a less confusing synopses when tested on publicly available dataset videos as well as in-house dataset videos.