Aurobinda Routray

CV
h-index41
40papers
1,419citations
Novelty32%
AI Score39

40 Papers

CVSep 2, 2024Code
SINET: Sparsity-driven Interpretable Neural Network for Underwater Image Enhancement

Gargi Panda, Soumitra Kundu, Saumik Bhattacharya et al.

Improving the quality of underwater images is essential for advancing marine research and technology. This work introduces a sparsity-driven interpretable neural network (SINET) for the underwater image enhancement (UIE) task. Unlike pure deep learning methods, our network architecture is based on a novel channel-specific convolutional sparse coding (CCSC) model, ensuring good interpretability of the underlying image enhancement process. The key feature of SINET is that it estimates the salient features from the three color channels using three sparse feature estimation blocks (SFEBs). The architecture of SFEB is designed by unrolling an iterative algorithm for solving the $\ell_1$ regularized convolutional sparse coding (CSC) problem. Our experiments show that SINET surpasses state-of-the-art PSNR value by $1.05$ dB with $3873$ times lower computational complexity. Code can be found at: https://github.com/gargi884/SINET-UIE/tree/main.

LGSep 13, 2024Code
INN-PAR: Invertible Neural Network for PPG to ABP Reconstruction

Soumitra Kundu, Gargi Panda, Saumik Bhattacharya et al.

Non-invasive and continuous blood pressure (BP) monitoring is essential for the early prevention of many cardiovascular diseases. Estimating arterial blood pressure (ABP) from photoplethysmography (PPG) has emerged as a promising solution. However, existing deep learning approaches for PPG-to-ABP reconstruction (PAR) encounter certain information loss, impacting the precision of the reconstructed signal. To overcome this limitation, we introduce an invertible neural network for PPG to ABP reconstruction (INN-PAR), which employs a series of invertible blocks to jointly learn the mapping between PPG and its gradient with the ABP signal and its gradient. INN-PAR efficiently captures both forward and inverse mappings simultaneously, thereby preventing information loss. By integrating signal gradients into the learning process, INN-PAR enhances the network's ability to capture essential high-frequency details, leading to more accurate signal reconstruction. Moreover, we propose a multi-scale convolution module (MSCM) within the invertible block, enabling the model to learn features across multiple scales effectively. We have experimented on two benchmark datasets, which show that INN-PAR significantly outperforms the state-of-the-art methods in both waveform reconstruction and BP measurement accuracy. Codes can be found at: https://github.com/soumitra1992/INNPAR-PPG2ABP.

IVNov 18, 2023
LATIS: Lambda Abstraction-based Thermal Image Super-resolution

Gargi Panda, Soumitra Kundu, Saumik Bhattacharya et al.

Single image super-resolution (SISR) is an effective technique to improve the quality of low-resolution thermal images. Recently, transformer-based methods have achieved significant performance in SISR. However, in the SR task, only a small number of pixels are involved in the transformers self-attention (SA) mechanism due to the computational complexity of the attention mechanism. The lambda abstraction is a promising alternative to SA in modeling long-range interactions while being computationally more efficient. This paper presents lambda abstraction-based thermal image super-resolution (LATIS), a novel lightweight architecture for SISR of thermal images. LATIS sequentially captures local and global information using the local and global feature block (LGFB). In LGFB, we introduce a global feature extraction (GFE) module based on the lambda abstraction mechanism, channel-shuffle and convolution (CSConv) layer to encode local context. Besides, to improve the performance further, we propose a differentiable patch-wise histogram-based loss function. Experimental results demonstrate that our LATIS, with the least model parameters and complexity, achieves better or comparable performance with state-of-the-art methods across multiple datasets.

CVOct 31, 2025
LifWavNet: Lifting Wavelet-based Network for Non-contact ECG Reconstruction from Radar

Soumitra Kundu, Gargi Panda, Saumik Bhattacharya et al.

Non-contact electrocardiogram (ECG) reconstruction from radar signals offers a promising approach for unobtrusive cardiac monitoring. We present LifWavNet, a lifting wavelet network based on a multi-resolution analysis and synthesis (MRAS) model for radar-to-ECG reconstruction. Unlike prior models that use fixed wavelet approaches, LifWavNet employs learnable lifting wavelets with lifting and inverse lifting units to adaptively capture radar signal features and synthesize physiologically meaningful ECG waveforms. To improve reconstruction fidelity, we introduce a multi-resolution short-time Fourier transform (STFT) loss, that enforces consistency with the ground-truth ECG in both temporal and spectral domains. Evaluations on two public datasets demonstrate that LifWavNet outperforms state-of-the-art methods in ECG reconstruction and downstream vital sign estimation (heart rate and heart rate variability). Furthermore, intermediate feature visualization highlights the interpretability of multi-resolution decomposition and synthesis in radar-to-ECG reconstruction. These results establish LifWavNet as a robust framework for radar-based non-contact ECG measurement.

CVDec 5, 2025
Hyperspectral Unmixing with 3D Convolutional Sparse Coding and Projected Simplex Volume Maximization

Gargi Panda, Soumitra Kundu, Saumik Bhattacharya et al.

Hyperspectral unmixing (HSU) aims to separate each pixel into its constituent endmembers and estimate their corresponding abundance fractions. This work presents an algorithm-unrolling-based network for the HSU task, named the 3D Convolutional Sparse Coding Network (3D-CSCNet), built upon a 3D CSC model. Unlike existing unrolling-based networks, our 3D-CSCNet is designed within the powerful autoencoder (AE) framework. Specifically, to solve the 3D CSC problem, we propose a 3D CSC block (3D-CSCB) derived through deep algorithm unrolling. Given a hyperspectral image (HSI), 3D-CSCNet employs the 3D-CSCB to estimate the abundance matrix. The use of 3D CSC enables joint learning of spectral and spatial relationships in the 3D HSI data cube. The estimated abundance matrix is then passed to the AE decoder to reconstruct the HSI, and the decoder weights are extracted as the endmember matrix. Additionally, we propose a projected simplex volume maximization (PSVM) algorithm for endmember estimation, and the resulting endmembers are used to initialize the decoder weights of 3D-CSCNet. Extensive experiments on three real datasets and one simulated dataset with three different signal-to-noise ratio (SNR) levels demonstrate that our 3D-CSCNet outperforms state-of-the-art methods.

CVMar 4, 2025
SSNet: Saliency Prior and State Space Model-based Network for Salient Object Detection in RGB-D Images

Gargi Panda, Soumitra Kundu, Saumik Bhattacharya et al.

Salient object detection (SOD) in RGB-D images is an essential task in computer vision, enabling applications in scene understanding, robotics, and augmented reality. However, existing methods struggle to capture global dependency across modalities, lack comprehensive saliency priors from both RGB and depth data, and are ineffective in handling low-quality depth maps. To address these challenges, we propose SSNet, a saliency-prior and state space model (SSM)-based network for the RGB-D SOD task. Unlike existing convolution- or transformer-based approaches, SSNet introduces an SSM-based multi-modal multi-scale decoder module to efficiently capture both intra- and inter-modal global dependency with linear complexity. Specifically, we propose a cross-modal selective scan SSM (CM-S6) mechanism, which effectively captures global dependency between different modalities. Furthermore, we introduce a saliency enhancement module (SEM) that integrates three saliency priors with deep features to refine feature representation and improve the localization of salient objects. To further address the issue of low-quality depth maps, we propose an adaptive contrast enhancement technique that dynamically refines depth maps, making them more suitable for the RGB-D SOD task. Extensive quantitative and qualitative experiments on seven benchmark datasets demonstrate that SSNet outperforms state-of-the-art methods.

CVNov 7, 2024
l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion

Gargi Panda, Soumitra Kundu, Saumik Bhattacharya et al.

Multi-modal image fusion (MMIF) enhances the information content of the fused image by combining the unique as well as common features obtained from different modality sensor images, improving visualization, object detection, and many more tasks. In this work, we introduce an interpretable network for the MMIF task, named FNet, based on an l0-regularized multi-modal convolutional sparse coding (MCSC) model. Specifically, for solving the l0-regularized CSC problem, we develop an algorithm unrolling-based l0-regularized sparse coding (LZSC) block. Given different modality source images, FNet first separates the unique and common features from them using the LZSC block and then these features are combined to generate the final fused image. Additionally, we propose an l0-regularized MCSC model for the inverse fusion process. Based on this model, we introduce an interpretable inverse fusion network named IFNet, which is utilized during FNet's training. Extensive experiments show that FNet achieves high-quality fusion results across five different MMIF tasks. Furthermore, we show that FNet enhances downstream object detection in visible-thermal image pairs. We have also visualized the intermediate results of FNet, which demonstrates the good interpretability of our network.

CVFeb 23, 2019
Illumination-invariant Face recognition by fusing thermal and visual images via gradient transfer

Sumit Agarwal, Harshit S. Sikchi, Suparna Rooj et al.

Face recognition in real life situations like low illumination condition is still an open challenge in biometric security. It is well established that the state-of-the-art methods in face recognition provide low accuracy in the case of poor illumination. In this work, we propose an algorithm for a more robust illumination invariant face recognition using a multi-modal approach. We propose a new dataset consisting of aligned faces of thermal and visual images of a hundred subjects. We then apply face detection on thermal images using the biggest blob extraction method and apply them for fusing images of different modalities for the purpose of face recognition. An algorithm is proposed to implement fusion of thermal and visual images. We reason for why relying on only one modality can give erroneous results. We use a lighter and faster CNN model called MobileNet for the purpose of face recognition with faster inferencing and to be able to be use it in real time biometric systems. We test our proposed method on our own created dataset to show that real-time face recognition on fused images shows far better results than using visual or thermal images separately.

CVDec 7, 2018
Spatial-Spectral Regularized Local Scaling Cut for Dimensionality Reduction in Hyperspectral Image Classification

Ramanarayan Mohanty, S L Happy, Aurobinda Routray

Dimensionality reduction (DR) methods have attracted extensive attention to provide discriminative information and reduce the computational burden of the hyperspectral image (HSI) classification. However, the DR methods face many challenges due to limited training samples with high dimensional spectra. To address this issue, a graph-based spatial and spectral regularized local scaling cut (SSRLSC) for DR of HSI data is proposed. The underlying idea of the proposed method is to utilize the information from both the spectral and spatial domains to achieve better classification accuracy than its spectral domain counterpart. In SSRLSC, a guided filter is initially used to smoothen and homogenize the pixels of the HSI data in order to preserve the pixel consistency. This is followed by generation of between-class and within-class dissimilarity matrices in both spectral and spatial domains by regularized local scaling cut (RLSC) and neighboring pixel local scaling cut (NPLSC) respectively. Finally, we obtain the projection matrix by optimizing the updated spatial-spectral between-class and total-class dissimilarity. The effectiveness of the proposed DR algorithm is illustrated with two popular real-world HSI datasets.

LGNov 20, 2018
A Semi-supervised Spatial Spectral Regularized Manifold Local Scaling Cut With HGF for Dimensionality Reduction of Hyperspectral Images

Ramanarayan Mohanty, SL Happy, Aurobinda Routray

Hyperspectral images (HSI) contain a wealth of information over hundreds of contiguous spectral bands, making it possible to classify materials through subtle spectral discrepancies. However, the classification of this rich spectral information is accompanied by the challenges like high dimensionality, singularity, limited training samples, lack of labeled data samples, heteroscedasticity and nonlinearity. To address these challenges, we propose a semi-supervised graph based dimensionality reduction method named `semi-supervised spatial spectral regularized manifold local scaling cut' (S3RMLSC). The underlying idea of the proposed method is to exploit the limited labeled information from both the spectral and spatial domains along with the abundant unlabeled samples to facilitate the classification task by retaining the original distribution of the data. In S3RMLSC, a hierarchical guided filter (HGF) is initially used to smoothen the pixels of the HSI data to preserve the spatial pixel consistency. This step is followed by the construction of linear patches from the nonlinear manifold by using the maximal linear patch (MLP) criterion. Then the inter-patch and intra-patch dissimilarity matrices are constructed in both spectral and spatial domains by regularized manifold local scaling cut (RMLSC) and neighboring pixel manifold local scaling cut (NPMLSC) respectively. Finally, we obtain the projection matrix by optimizing the updated semi-supervised spatial-spectral between-patch and total-patch dissimilarity. The effectiveness of the proposed DR algorithm is illustrated with publicly available real-world HSI datasets.

CVJul 27, 2018
ESCaF: Pupil Centre Localization Algorithm with Candidate Filtering

Anjith George, Aurobinda Routray

Algorithms for accurate localization of pupil centre is essential for gaze tracking in real world conditions. Most of the algorithms fail in real world conditions like illumination variations, contact lenses, glasses, eye makeup, motion blur, noise, etc. We propose a new algorithm which improves the detection rate in real world conditions. The proposed algorithm uses both edges as well as intensity information along with a candidate filtering approach to identify the best pupil candidate. A simple tracking scheme has also been added which improves the processing speed. The algorithm has been evaluated in Labelled Pupil in the Wild (LPW) dataset, largest in its class which contains real world conditions. The proposed algorithm outperformed the state of the art algorithms while achieving real-time performance.

CVJul 22, 2018
A Trace Lasso Regularized L1-norm Graph Cut for Highly Correlated Noisy Hyperspectral Image

Ramanarayan Mohanty, S L Happy, Nilesh Suthar et al.

This work proposes an adaptive trace lasso regularized L1-norm based graph cut method for dimensionality reduction of Hyperspectral images, called as `Trace Lasso-L1 Graph Cut' (TL-L1GC). The underlying idea of this method is to generate the optimal projection matrix by considering both the sparsity as well as the correlation of the data samples. The conventional L2-norm used in the objective function is sensitive to noise and outliers. Therefore, in this work L1-norm is utilized as a robust alternative to L2-norm. Besides, for further improvement of the results, we use a penalty function of trace lasso with the L1GC method. It adaptively balances the L2-norm and L1-norm simultaneously by considering the data correlation along with the sparsity. We obtain the optimal projection matrix by maximizing the ratio of between-class dispersion to within-class dispersion using L1-norm with trace lasso as the penalty. Furthermore, an iterative procedure for this TL-L1GC method is proposed to solve the optimization function. The effectiveness of this proposed method is evaluated on two benchmark HSI datasets.

LGJul 7, 2018
A Supervised Geometry-Aware Mapping Approach for Classification of Hyperspectral Images

Ramanarayan Mohanty, S L Happy, Aurobinda Routray

The lack of proper class discrimination among the Hyperspectral (HS) data points poses a potential challenge in HS classification. To address this issue, this paper proposes an optimal geometry-aware transformation for enhancing the classification accuracy. The underlying idea of this method is to obtain a linear projection matrix by solving a nonlinear objective function based on the intrinsic geometrical structure of the data. The objective function is constructed to quantify the discrimination between the points from dissimilar classes on the projected data space. Then the obtained projection matrix is used to linearly map the data to more discriminative space. The effectiveness of the proposed transformation is illustrated with three benchmark real-world HS data sets. The experiments reveal that the classification and dimensionality reduction methods on the projected discriminative space outperform their counterpart in the original space.

CVMay 18, 2018
Recognition of Activities from Eye Gaze and Egocentric Video

Anjith George, Aurobinda Routray

This paper presents a framework for recognition of human activity from egocentric video and eye tracking data obtained from a head-mounted eye tracker. Three channels of information such as eye movement, ego-motion, and visual features are combined for the classification of activities. Image features were extracted using a pre-trained convolutional neural network. Eye and ego-motion are quantized, and the windowed histograms are used as the features. The combination of features obtains better accuracy for activity classification as compared to individual features.

CVSep 9, 2017
Graph Scaling Cut with L1-Norm for Classification of Hyperspectral Images

Ramanarayan Mohanty, S L Happy, Aurobinda Routray

In this paper, we propose an L1 normalized graph based dimensionality reduction method for Hyperspectral images, called as L1-Scaling Cut (L1-SC). The underlying idea of this method is to generate the optimal projection matrix by retaining the original distribution of the data. Though L2-norm is generally preferred for computation, it is sensitive to noise and outliers. However, L1-norm is robust to them. Therefore, we obtain the optimal projection matrix by maximizing the ratio of between-class dispersion to within-class dispersion using L1-norm. Furthermore, an iterative algorithm is described to solve the optimization problem. The experimental results of the HSI classification confirm the effectiveness of the proposed L1-SC method on both noisy and noiseless data.

CVAug 8, 2017
An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data

S L Happy, Ramanarayan Mohanty, Aurobinda Routray

Feature selection has been studied widely in the literature. However, the efficacy of the selection criteria for low sample size applications is neglected in most cases. Most of the existing feature selection criteria are based on the sample similarity. However, the distance measures become insignificant for high dimensional low sample size (HDLSS) data. Moreover, the variance of a feature with a few samples is pointless unless it represents the data distribution efficiently. Instead of looking at the samples in groups, we evaluate their efficiency based on pairwise fashion. In our investigation, we noticed that considering a pair of samples at a time and selecting the features that bring them closer or put them far away is a better choice for feature selection. Experimental results on benchmark data sets demonstrate the effectiveness of the proposed method with low sample size, which outperforms many other state-of-the-art feature selection methods.

CVFeb 17, 2017
An Unsupervised Approach for Overlapping Cervical Cell Cytoplasm Segmentation

Pranav Kumar, S L Happy, Swarnadip Chatterjee et al.

The poor contrast and the overlapping of cervical cell cytoplasm are the major issues in the accurate segmentation of cervical cell cytoplasm. This paper presents an automated unsupervised cytoplasm segmentation approach which can effectively find the cytoplasm boundaries in overlapping cells. The proposed approach first segments the cell clumps from the cervical smear image and detects the nuclei in each cell clump. A modified Otsu method with prior class probability is proposed for accurate segmentation of nuclei from the cell clumps. Using distance regularized level set evolution, the contour around each nucleus is evolved until it reaches the cytoplasm boundaries. Promising results were obtained by experimenting on ISBI 2015 challenge dataset.

LGDec 2, 2016
A Novel Framework based on SVDD to Classify Water Saturation from Seismic Attributes

Soumi Chaki, Akhilesh Kumar Verma, Aurobinda Routray et al.

Water saturation is an important property in reservoir engineering domain. Thus, satisfactory classification of water saturation from seismic attributes is beneficial for reservoir characterization. However, diverse and non-linear nature of subsurface attributes makes the classification task difficult. In this context, this paper proposes a generalized Support Vector Data Description (SVDD) based novel classification framework to classify water saturation into two classes (Class high and Class low) from three seismic attributes seismic impedance, amplitude envelop, and seismic sweetness. G-metric means and program execution time are used to quantify the performance of the proposed framework along with established supervised classifiers. The documented results imply that the proposed framework is superior to existing classifiers. The present study is envisioned to contribute in further reservoir modeling.

LGDec 2, 2016
Development of a hybrid learning system based on SVM, ANFIS and domain knowledge: DKFIS

Soumi Chaki, Aurobinda Routray, William K. Mohanty et al.

This paper presents the development of a hybrid learning system based on Support Vector Machines (SVM), Adaptive Neuro-Fuzzy Inference System (ANFIS) and domain knowledge to solve prediction problem. The proposed two-stage Domain Knowledge based Fuzzy Information System (DKFIS) improves the prediction accuracy attained by ANFIS alone. The proposed framework has been implemented on a noisy and incomplete dataset acquired from a hydrocarbon field located at western part of India. Here, oil saturation has been predicted from four different well logs i.e. gamma ray, resistivity, density, and clay volume. In the first stage, depending on zero or near zero and non-zero oil saturation levels the input vector is classified into two classes (Class 0 and Class 1) using SVM. The classification results have been further fine-tuned applying expert knowledge based on the relationship among predictor variables i.e. well logs and target variable - oil saturation. Second, an ANFIS is designed to predict non-zero (Class 1) oil saturation values from predictor logs. The predicted output has been further refined based on expert knowledge. It is apparent from the experimental results that the expert intervention with qualitative judgment at each stage has rendered the prediction into the feasible and realistic ranges. The performance analysis of the prediction in terms of four performance metrics such as correlation coefficient (CC), root mean square error (RMSE), and absolute error mean (AEM), scatter index (SI) has established DKFIS as a useful tool for reservoir characterization.

LGDec 2, 2016
A novel multiclassSVM based framework to classify lithology from well logs: a real-world application

Soumi Chaki, Aurobinda Routray, William K. Mohanty et al.

Support vector machines (SVMs) have been recognized as a potential tool for supervised classification analyses in different domains of research. In essence, SVM is a binary classifier. Therefore, in case of a multiclass problem, the problem is divided into a series of binary problems which are solved by binary classifiers, and finally the classification results are combined following either the one-against-one or one-against-all strategies. In this paper, an attempt has been made to classify lithology using a multiclass SVM based framework using well logs as predictor variables. Here, the lithology is classified into four classes such as sand, shaly sand, sandy shale and shale based on the relative values of sand and shale fractions as suggested by an expert geologist. The available dataset consisting well logs (gamma ray, neutron porosity, density, and P-sonic) and class information from four closely spaced wells from an onshore hydrocarbon field is divided into training and testing sets. We have used one-against-all strategy to combine the results of multiple binary classifiers. The reported results established the superiority of multiclass SVM compared to other classifiers in terms of classification accuracy. The selection of kernel function and associated parameters has also been investigated here. It can be envisaged from the results achieved in this study that the proposed framework based on multiclass SVM can further be used to solve classification problems. In future research endeavor, seismic attributes can be introduced in the framework to classify the lithology throughout a study area from seismic inputs.

LGDec 2, 2016
A One class Classifier based Framework using SVDD : Application to an Imbalanced Geological Dataset

Soumi Chaki, Akhilesh Kumar Verma, Aurobinda Routray et al.

Evaluation of hydrocarbon reservoir requires classification of petrophysical properties from available dataset. However, characterization of reservoir attributes is difficult due to the nonlinear and heterogeneous nature of the subsurface physical properties. In this context, present study proposes a generalized one class classification framework based on Support Vector Data Description (SVDD) to classify a reservoir characteristic water saturation into two classes (Class high and Class low) from four logs namely gamma ray, neutron porosity, bulk density, and P sonic using an imbalanced dataset. A comparison is carried out among proposed framework and different supervised classification algorithms in terms of g metric means and execution time. Experimental results show that proposed framework has outperformed other classifiers in terms of these performance evaluators. It is envisaged that the classification analysis performed in this study will be useful in further reservoir modeling.

CVMay 17, 2016
Fast and Accurate Algorithm for Eye Localization for Gaze Tracking in Low Resolution Images

Anjith George, Aurobinda Routray

Iris centre localization in low-resolution visible images is a challenging problem in computer vision community due to noise, shadows, occlusions, pose variations, eye blinks, etc. This paper proposes an efficient method for determining iris centre in low-resolution images in the visible spectrum. Even low-cost consumer-grade webcams can be used for gaze tracking without any additional hardware. A two-stage algorithm is proposed for iris centre localization. The proposed method uses geometrical characteristics of the eye. In the first stage, a fast convolution based approach is used for obtaining the coarse location of iris centre (IC). The IC location is further refined in the second stage using boundary tracing and ellipse fitting. The algorithm has been evaluated in public databases like BioID, Gi4E and is found to outperform the state of the art methods.

CVMay 17, 2016
Real-time Eye Gaze Direction Classification Using Convolutional Neural Network

Anjith George, Aurobinda Routray

Estimation eye gaze direction is useful in various human-computer interaction tasks. Knowledge of gaze direction can give valuable information regarding users point of attention. Certain patterns of eye movements known as eye accessing cues are reported to be related to the cognitive processes in the human brain. We propose a real-time framework for the classification of eye gaze direction and estimation of eye accessing cues. In the first stage, the algorithm detects faces using a modified version of the Viola-Jones algorithm. A rough eye region is obtained using geometric relations and facial landmarks. The eye region obtained is used in the subsequent stage to classify the eye gaze direction. A convolutional neural network is employed in this work for the classification of eye gaze direction. The proposed algorithm was tested on Eye Chimera database and found to outperform state of the art methods. The computational complexity of the algorithm is very less in the testing phase. The algorithm achieved an average frame rate of 24 fps in the desktop environment.

CVJan 13, 2016
A Score-level Fusion Method for Eye Movement Biometrics

Anjith George, Aurobinda Routray

This paper proposes a novel framework for the use of eye movement patterns for biometric applications. Eye movements contain abundant information about cognitive brain functions, neural pathways, etc. In the proposed method, eye movement data is classified into fixations and saccades. Features extracted from fixations and saccades are used by a Gaussian Radial Basis Function Network (GRBFN) based method for biometric authentication. A score fusion approach is adopted to classify the data in the output layer. In the evaluation stage, the algorithm has been tested using two types of stimuli: random dot following on a screen and text reading. The results indicate the strength of eye movement pattern as a biometric modality. The algorithm has been evaluated on BioEye 2015 database and found to outperform all the other methods. Eye movements are generated by a complex oculomotor plant which is very hard to spoof by mechanical replicas. Use of eye movement dynamics along with iris recognition technology may lead to a robust counterfeit-resistant person identification system.

CVDec 3, 2015
The Indian Spontaneous Expression Database for Emotion Recognition

S L Happy, Priyadarshi Patnaik, Aurobinda Routray et al.

Automatic recognition of spontaneous facial expressions is a major challenge in the field of affective computing. Head rotation, face pose, illumination variation, occlusion etc. are the attributes that increase the complexity of recognition of spontaneous expressions in practical applications. Effective recognition of expressions depends significantly on the quality of the database used. Most well-known facial expression databases consist of posed expressions. However, currently there is a huge demand for spontaneous expression databases for the pragmatic implementation of the facial expression recognition algorithms. In this paper, we propose and establish a new facial expression database containing spontaneous expressions of both male and female participants of Indian origin. The database consists of 428 segmented video clips of the spontaneous facial expressions of 50 participants. In our experiment, emotions were induced among the participants by using emotional videos and simultaneously their self-ratings were collected for each experienced emotion. Facial expression clips were annotated carefully by four trained decoders, which were further validated by the nature of stimuli used and self-report of emotions. An extensive analysis was carried out on the database using several machine learning algorithms and the results are provided for future reference. Such a spontaneous database will help in the development and validation of algorithms for recognition of spontaneous expressions.

NESep 23, 2015
Well Tops Guided Prediction of Reservoir Properties using Modular Neural Network Concept A Case Study from Western Onshore, India

Soumi Chaki, Akhilesh K Verma, Aurobinda Routray et al.

This paper proposes a complete framework consisting pre-processing, modeling, and post-processing stages to carry out well tops guided prediction of a reservoir property (sand fraction) from three seismic attributes (seismic impedance, instantaneous amplitude, and instantaneous frequency) using the concept of modular artificial neural network (MANN). The data set used in this study comprising three seismic attributes and well log data from eight wells, is acquired from a western onshore hydrocarbon field of India. Firstly, the acquired data set is integrated and normalized. Then, well log analysis and segmentation of the total depth range into three different units (zones) separated by well tops are carried out. Secondly, three different networks are trained corresponding to three different zones using combined data set of seven wells and then trained networks are validated using the remaining test well. The target property of the test well is predicted using three different tuned networks corresponding to three zones; and then the estimated values obtained from three different networks are concatenated to represent the predicted log along the complete depth range of the testing well. The application of multiple simpler networks instead of a single one improves the prediction accuracy in terms of performance metrics such as correlation coefficient, root mean square error, absolute error mean and program execution time.

CESep 23, 2015
Quantification of sand fraction from seismic attributes using Neuro-Fuzzy approach

Akhilesh K Verma, Soumi Chaki, Aurobinda Routray et al.

In this paper, we illustrate the modeling of a reservoir property (sand fraction) from seismic attributes namely seismic impedance, seismic amplitude, and instantaneous frequency using Neuro-Fuzzy (NF) approach. Input dataset includes 3D post-stacked seismic attributes and six well logs acquired from a hydrocarbon field located in the western coast of India. Presence of thin sand and shale layers in the basin area makes the modeling of reservoir characteristic a challenging task. Though seismic data is helpful in extrapolation of reservoir properties away from boreholes; yet, it could be challenging to delineate thin sand and shale reservoirs using seismic data due to its limited resolvability. Therefore, it is important to develop state-of-art intelligent methods for calibrating a nonlinear mapping between seismic data and target reservoir variables. Neural networks have shown its potential to model such nonlinear mappings; however, uncertainties associated with the model and datasets are still a concern. Hence, introduction of Fuzzy Logic (FL) is beneficial for handling these uncertainties. More specifically, hybrid variants of Artificial Neural Network (ANN) and fuzzy logic, i.e., NF methods, are capable for the modeling reservoir characteristics by integrating the explicit knowledge representation power of FL with the learning ability of neural networks. The documented results in this study demonstrate acceptable resemblance between target and predicted variables, and hence, encourage the application of integrated machine learning approaches such as Neuro-Fuzzy in reservoir characterization domain. Furthermore, visualization of the variation of sand probability in the study area would assist in identifying placement of potential wells for future drilling operations.

CESep 23, 2015
A Novel Pre-processing Scheme to Improve the Prediction of Sand Fraction from Seismic Attributes using Neural Networks

Soumi Chaki, Aurobinda Routray, William K. Mohanty

This paper presents a novel pre-processing scheme to improve the prediction of sand fraction from multiple seismic attributes such as seismic impedance, amplitude and frequency using machine learning and information filtering. The available well logs along with the 3-D seismic data have been used to benchmark the proposed pre-processing stage using a methodology which primarily consists of three steps: pre-processing, training and post-processing. An Artificial Neural Network (ANN) with conjugate-gradient learning algorithm has been used to model the sand fraction. The available sand fraction data from the high resolution well logs has far more information content than the low resolution seismic attributes. Therefore, regularization schemes based on Fourier Transform (FT), Wavelet Decomposition (WD) and Empirical Mode Decomposition (EMD) have been proposed to shape the high resolution sand fraction data for effective machine learning. The input data sets have been segregated into training, testing and validation sets. The test results are primarily used to check different network structures and activation function performances. Once the network passes the testing phase with an acceptable performance in terms of the selected evaluators, the validation phase follows. In the validation stage, the prediction model is tested against unseen data. The network yielding satisfactory performance in the validation stage is used to predict lithological properties from seismic attributes throughout a given volume. Finally, a post-processing scheme using 3-D spatial filtering is implemented for smoothing the sand fraction in the volume. Prediction of lithological properties using this framework is helpful for Reservoir Characterization.

CVSep 16, 2015
An Improved Algorithm for Eye Corner Detection

Anirban Dasgupta, Anshit Mandloi, Anjith George et al.

In this paper, a modified algorithm for the detection of nasal and temporal eye corners is presented. The algorithm is a modification of the Santos and Proenka Method. In the first step, we detect the face and the eyes using classifiers based on Haar-like features. We then segment out the sclera, from the detected eye region. From the segmented sclera, we segment out an approximate eyelid contour. Eye corner candidates are obtained using Harris and Stephens corner detector. We introduce a post-pruning of the Eye corner candidates to locate the eye corners, finally. The algorithm has been tested on Yale, JAFFE databases as well as our created database.

CVSep 16, 2015
SPECFACE - A Dataset of Human Faces Wearing Spectacles

Anirban Dasgupta, Shubhobrata Bhattacharya, Aurobinda Routray

This paper presents a database of human faces for persons wearing spectacles. The database consists of images of faces having significant variations with respect to illumination, head pose, skin color, facial expressions and sizes, and nature of spectacles. The database contains data of 60 subjects. This database is expected to be a precious resource for the development and evaluation of algorithms for face detection, eye detection, head tracking, eye gaze tracking, etc., for subjects wearing spectacles. As such, this can be a valuable contribution to the computer vision community.

CVJun 16, 2015
Evaluation of Denoising Techniques for EOG signals based on SNR Estimation

Anirban Dasgupta, Suvodip Chakrborty, Aritra Chaudhuri et al.

This paper evaluates four algorithms for denoising raw Electrooculography (EOG) data based on the Signal to Noise Ratio (SNR). The SNR is computed using the eigenvalue method. The filtering algorithms are a) Finite Impulse Response (FIR) bandpass filters, b) Stationary Wavelet Transform, c) Empirical Mode Decomposition (EMD) d) FIR Median Hybrid Filters. An EOG dataset has been prepared where the subject is asked to perform letter cancelation test on 20 subjects.

CVMay 29, 2015
Fast Computation of PERCLOS and Saccadic Ratio

Anirban Dasgupta, Aurobinda Routray

This thesis describes the development of fast algorithms for the computation of PERcentage CLOSure of eyes (PERCLOS) and Saccadic Ratio (SR). PERCLOS and SR are two ocular parameters reported to be measures of alertness levels in human beings. PERCLOS is the percentage of time in which at least 80% of the eyelid remains closed over the pupil. Saccades are fast and simultaneous movement of both the eyes in the same direction. SR is the ratio of peak saccadic velocity to the saccadic duration. This thesis addresses the issues of image based estimation of PERCLOS and SR, prevailing in the literature such as illumination variation, poor illumination conditions, head rotations etc. In this work, algorithms for real-time PERCLOS computation has been developed and implemented on an embedded platform. The platform has been used as a case study for assessment of loss of attention in automotive drivers. The SR estimation has been carried out offline as real-time implementation requires high frame rates of processing which is difficult to achieve due to hardware limitations. The accuracy in estimation of the loss of attention using PERCLOS and SR has been validated using brain signals, which are reported to be an authentic cue for estimating the state of alertness in human beings. The major contributions of this thesis include database creation, design and implementation of fast algorithms for estimating PERCLOS and SR on embedded computing platforms.

CVMay 22, 2015
Design and Implementation of Real-time Algorithms for Eye Tracking and PERCLOS Measurement for on board Estimation of Alertness of Drivers

Anjith George, Aurobinda Routray

The alertness level of drivers can be estimated with the use of computer vision based methods. The level of fatigue can be found from the value of PERCLOS. It is the ratio of closed eye frames to the total frames processed. The main objective of the thesis is the design and implementation of real-time algorithms for measurement of PERCLOS. In this work we have developed a real-time system which is able to process the video onboard and to alarm the driver in case the driver is in alert. For accurate estimation of PERCLOS the frame rate should be greater than 4 and accuracy should be greater than 90%. For eye detection we have used mainly two approaches Haar classifier based method and Principal Component Analysis (PCA) based method for day time. During night time active Near Infra Red (NIR) illumination is used. Local Binary Pattern (LBP) histogram based method is used for the detection of eyes at night time. The accuracy rate of the algorithms was found to be more than 90% at frame rates more than 5 fps which was suitable for the application.

CVMay 15, 2015
A Real Time Facial Expression Classification System Using Local Binary Patterns

S. L. Happy, Anjith George, Aurobinda Routray

Facial expression analysis is one of the popular fields of research in human computer interaction (HCI). It has several applications in next generation user interfaces, human emotion analysis, behavior and cognitive modeling. In this paper, a facial expression classification algorithm is proposed which uses Haar classifier for face detection purpose, Local Binary Patterns (LBP) histogram of different block sizes of a face image as feature vectors and classifies various facial expressions using Principal Component Analysis (PCA). The algorithm is implemented in real time for expression classification since the computational complexity of the algorithm is small. A customizable approach is proposed for facial expression analysis, since the various expressions and intensity of expressions vary from person to person. The system uses grayscale frontal face images of a person to classify six basic emotions namely happiness, sadness, disgust, fear, surprise and anger.

CVMay 15, 2015
A Video Database of Human Faces under Near Infra-Red Illumination for Human Computer Interaction Aplications

S L Happy, Anirban Dasgupta, Anjith George et al.

Human Computer Interaction (HCI) is an evolving area of research for coherent communication between computers and human beings. Some of the important applications of HCI as reported in literature are face detection, face pose estimation, face tracking and eye gaze estimation. Development of algorithms for these applications is an active field of research. However, availability of standard database to validate such algorithms is insufficient. This paper discusses the creation of such a database created under Near Infra-Red (NIR) illumination. NIR illumination has gained its popularity for night mode applications since prolonged exposure to Infra-Red (IR) lighting may lead to many health issues. The database contains NIR videos of 60 subjects in different head orientations and with different facial expressions, facial occlusions and illumination variation. This new database can be a very valuable resource for development and evaluation of algorithms on face detection, eye detection, head tracking, eye gaze tracking etc. in NIR lighting.

CVMay 15, 2015
Robust Facial Expression Classification Using Shape and Appearance Features

S. L. Happy, Aurobinda Routray

Facial expression recognition has many potential applications which has attracted the attention of researchers in the last decade. Feature extraction is one important step in expression analysis which contributes toward fast and accurate expression recognition. This paper represents an approach of combining the shape and appearance features to form a hybrid feature vector. We have extracted Pyramid of Histogram of Gradients (PHOG) as shape descriptors and Local Binary Patterns (LBP) as appearance features. The proposed framework involves a novel approach of extracting hybrid features from active facial patches. The active facial patches are located on the face regions which undergo a major change during different expressions. After detection of facial landmarks, the active patches are localized and hybrid features are calculated from these patches. The use of small parts of face instead of the whole face for extracting features reduces the computational cost and prevents the over-fitting of the features for classification. By using linear discriminant analysis, the dimensionality of the feature is reduced which is further classified by using the support vector machine (SVM). The experimental results on two publicly available databases show promising accuracy in recognizing all expression classes.

CVMay 15, 2015
Automatic Facial Expression Recognition Using Features of Salient Facial Patches

S L Happy, Aurobinda Routray

Extraction of discriminative features from salient facial patches plays a vital role in effective facial expression recognition. The accurate detection of facial landmarks improves the localization of the salient patches on face images. This paper proposes a novel framework for expression recognition by using appearance features of selected facial patches. A few prominent facial patches, depending on the position of facial landmarks, are extracted which are active during emotion elicitation. These active patches are further processed to obtain the salient patches which contain discriminative features for classification of each pair of expressions, thereby selecting different facial patches as salient for different pair of expression classes. One-against-one classification method is adopted using these features. In addition, an automated learning-free facial landmark detection technique has been proposed, which achieves similar performances as that of other state-of-art landmark detection methods, yet requires significantly less execution time. The proposed method is found to perform well consistently in different resolutions, hence, providing a solution for expression recognition in low resolution images. Experiments on CK+ and JAFFE facial expression databases show the effectiveness of the proposed system.

CVMay 13, 2015
A Vision Based System for Monitoring the Loss of Attention in Automotive Drivers

Anirban Dasgupta, Anjith George, S. L. Happy et al.

On board monitoring of the alertness level of an automotive driver has been a challenging research in transportation safety and management. In this paper, we propose a robust real time embedded platform to monitor the loss of attention of the driver during day as well as night driving conditions. The PERcentage of eye CLOSure (PERCLOS) has been used as the indicator of the alertness level. In this approach, the face is detected using Haar like features and tracked using a Kalman Filter. The Eyes are detected using Principal Component Analysis (PCA) during day time and the block Local Binary Pattern (LBP) features during night. Finally the eye state is classified as open or closed using Support Vector Machines(SVM). In plane and off plane rotations of the drivers face have been compensated using Affine and Perspective Transformation respectively. Compensation in illumination variation is carried out using Bi Histogram Equalization (BHE). The algorithm has been cross validated using brain signals and finally been implemented on a Single Board Computer (SBC) having Intel Atom processor, 1 GB RAM, 1.66 GHz clock, x86 architecture, Windows Embedded XP operating system. The system is found to be robust under actual driving conditions.

CVMay 13, 2015
A Framework for Fast Face and Eye Detection

Anjith George, Anirban Dasgupta, Aurobinda Routray

Face detection is an essential step in many computer vision applications like surveillance, tracking, medical analysis, facial expression analysis etc. Several approaches have been made in the direction of face detection. Among them, Haar-like features based method is a robust method. In spite of the robustness, Haar - like features work with some limitations. However, with some simple modifications in the algorithm, its performance can be made faster and more robust. The present work refers to the increase in speed of operation of the original algorithm by down sampling the frames and its analysis with different scale factors. It also discusses the detection of tilted faces using an affine transformation of the input image.

CVNov 23, 2013
Dynamic Model of Facial Expression Recognition based on Eigen-face Approach

Nikunj Bajaj, Aurobinda Routray, S L Happy

Emotions are best way of communicating information; and sometimes it carry more information than words. Recently, there has been a huge interest in automatic recognition of human emotion because of its wide spread application in security, surveillance, marketing, advertisement, and human-computer interaction. To communicate with a computer in a natural way, it will be desirable to use more natural modes of human communication based on voice, gestures and facial expressions. In this paper, a holistic approach for facial expression recognition is proposed which captures the variation in facial features in temporal domain and classifies the sequence of images in different emotions. The proposed method uses Haar-like features to detect face in an image. The dimensionality of the eigenspace is reduced using Principal Component Analysis (PCA). By projecting the subsequent face images into principal eigen directions, the variation pattern of the obtained weight vector is modeled to classify it into different emotions. Owing to the variations of expressions for different people and its intensity, a person specific method for emotion recognition is followed. Using the gray scale images of the frontal face, the system is able to classify four basic emotions such as happiness, sadness, surprise, and anger.