CVJul 9, 2023
A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language ProcessingAishik Rakshit, Samyak Mehta, Anirban Dasgupta
Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The state-of-the-art methods work well with the OCR with printed text on license plates, shop names, etc. However, applications such as printed textbooks and handwritten texts have limited accuracy with existing techniques. The reason may be attributed to similar-looking characters and variations in handwritten characters. Since these issues are challenging to address with OCR technologies exclusively, we propose a post-processing approach using Natural Language Processing (NLP) tools. This work presents an end-to-end pipeline that first performs OCR on the handwritten or printed text and then improves its accuracy using NLP.
27.2LGMay 25
'Si'multaneous 'S'patial-'T'emporal Message Passing for Dynamic Graph Representation LearningShubhajit Roy, Anirban Dasgupta
Dynamic graph neural networks (DGNNs) that operate on snapshot sequences typically fall into one of two categories. \emph{Temporal-first} approaches build per-node temporal embeddings and only afterwards perform spatial aggregation, whereas \emph{Spatial-first} approaches invert this order, feeding the output of a graph convolution into a downstream temporal module. In either case, the rigid sequencing forces the second stage to consume an already-compressed summary produced by the first, ruling out joint reasoning over topology and evolution; concretely, the message-passing operator never gets to weight a neighbor's contribution by that neighbor's \emph{past} trajectory. This paper introduces \textbf{SiST-GNN} (\textbf{Si}multaneous \textbf{S}patial-\textbf{T}emporal \textbf{GNN}), which fuses the two signals inside a single message-passing operation rather than chaining them. Concretely, at each snapshot we maintain a recurrent hidden state per node that summarises its history, pair it with the node's current feature vector, and treat the pair as two nodes joined by a cross-time edge; running a standard graph convolution on this temporally augmented graph yields the updated representation. Our empirical study spans nine public baselines and fourteen model-dataset combinations, covering both fixed-split and live-update evaluation regimes. Across every public benchmark, SiST-GNN sets a new state of the art in link prediction task over the strongest prior method by $109$--$277\%$ in the fixed-split setting and by $68$--$194\%$ in the live-update setting. We additionally construct three dynamic node-classification tasks by discretising the underlying continuous-time event streams; here SiST-GNN beats the leading discrete-time (DTDG) baseline by $7$--$22\%$ and matches continuous-time (CTDG) methods that consume the raw events directly.
DSJan 1
Deterministic Coreset for Lp SubspaceRachit Chhaya, Anirban Dasgupta, Dan Feldman et al.
We introduce the first iterative algorithm for constructing a $\varepsilon$-coreset that guarantees deterministic $\ell_p$ subspace embedding for any $p \in [1,\infty)$ and any $\varepsilon > 0$. For a given full rank matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ where $n \gg d$, $\mathbf{X}' \in \mathbb{R}^{m \times d}$ is an $(\varepsilon,\ell_p)$-subspace embedding of $\mathbf{X}$, if for every $\mathbf{q} \in \mathbb{R}^d$, $(1-\varepsilon)\|\mathbf{Xq}\|_{p}^{p} \leq \|\mathbf{X'q}\|_{p}^{p} \leq (1+\varepsilon)\|\mathbf{Xq}\|_{p}^{p}$. Specifically, in this paper, $\mathbf{X}'$ is a weighted subset of rows of $\mathbf{X}$ which is commonly known in the literature as a coreset. In every iteration, the algorithm ensures that the loss on the maintained set is upper and lower bounded by the loss on the original dataset with appropriate scalings. So, unlike typical coreset guarantees, due to bounded loss, our coreset gives a deterministic guarantee for the $\ell_p$ subspace embedding. For an error parameter $\varepsilon$, our algorithm takes $O(\mathrm{poly}(n,d,\varepsilon^{-1}))$ time and returns a deterministic $\varepsilon$-coreset, for $\ell_p$ subspace embedding whose size is $O\left(\frac{d^{\max\{1,p/2\}}}{\varepsilon^{2}}\right)$. Here, we remove the $\log$ factors in the coreset size, which had been a long-standing open problem. Our coresets are optimal as they are tight with the lower bound. As an application, our coreset can also be used for approximately solving the $\ell_p$ regression problem in a deterministic manner.
LGNov 10, 2025
Private Sketches for Linear RegressionShrutimoy Das, Debanuj Nayak, Anirban Dasgupta
Linear regression is frequently applied in a variety of domains. In order to improve the efficiency of these methods, various methods have been developed that compute summaries or \emph{sketches} of the datasets. Certain domains, however, contain sensitive data which necessitates that the application of these statistical methods does not reveal private information. Differentially private (DP) linear regression methods have been developed for mitigating this problem. These techniques typically involve estimating a noisy version of the parameter vector. Instead, we propose releasing private sketches of the datasets. We present differentially private sketches for the problems of least squares regression, as well as least absolute deviations regression. The availability of these private sketches facilitates the application of commonly available solvers for regression, without the risk of privacy leakage.
LGDec 14, 2024
Linear Programming based Approximation to Individually Fair k-Clustering with OutliersBinita Maity, Shrutimoy Das, Anirban Dasgupta
Individual fairness guarantees are often desirable properties to have, but they become hard to formalize when the dataset contains outliers. Here, we investigate the problem of developing an individually fair $k$-means clustering algorithm for datasets that contain outliers. That is, given $n$ points and $k$ centers, we want that for each point which is not an outlier, there must be a center within the $\frac{n}{k}$ nearest neighbours of the given point. While a few of the recent works have looked into individually fair clustering, this is the first work that explores this problem in the presence of outliers for $k$-means clustering. For this purpose, we define and solve a linear program (LP) that helps us identify the outliers. We exclude these outliers from the dataset and apply a rounding algorithm that computes the $k$ centers, such that the fairness constraint of the remaining points is satisfied. We also provide theoretical guarantees that our method leads to a guaranteed approximation of the fair radius as well as the clustering cost. We also demonstrate our techniques empirically on real-world datasets.
LGOct 19, 2024
FIT-GNN: Faster Inference Time for GNNs that 'FIT' in Memory Using CoarseningShubhajit Roy, Hrriday Ruparel, Kishan Ved et al.
Scalability of Graph Neural Networks (GNNs) remains a significant challenge. To tackle this, methods like coarsening, condensation, and computation trees are used to train on a smaller graph, resulting in faster computation. Nonetheless, prior research has not adequately addressed the computational costs during the inference phase. This paper presents a novel approach to improve the scalability of GNNs by reducing computational burden during the inference phase using graph coarsening. We demonstrate two different methods -- Extra Nodes and Cluster Nodes. Our study extends the application of graph coarsening for graph-level tasks, including graph classification and graph regression. We conduct extensive experiments on multiple benchmark datasets to evaluate the performance of our approach. Our results show that the proposed method achieves orders of magnitude improvements in single-node inference time compared to traditional approaches. Furthermore, it significantly reduces memory consumption for node and graph classification and regression tasks, enabling efficient training and inference on low-resource devices where conventional methods are impractical. Notably, these computational advantages are achieved while maintaining competitive performance relative to baseline models.
LGDec 15, 2023
Simple Weak Coresets for Non-Decomposable Classification MeasuresJayesh Malaviya, Anirban Dasgupta, Rachit Chhaya
While coresets have been growing in terms of their application, barring few exceptions, they have mostly been limited to unsupervised settings. We consider supervised classification problems, and non-decomposable evaluation measures in such settings. We show that stratified uniform sampling based coresets have excellent empirical performance that are backed by theoretical guarantees too. We focus on the F1 score and Matthews Correlation Coefficient, two widely used non-decomposable objective functions that are nontrivial to optimize for and show that uniform coresets attain a lower bound for coreset size, and have good empirical performance, comparable with ``smarter'' coreset construction strategies.
LGMay 31, 2023
Local Fragments, Global Gains: Subgraph Counting using Graph Neural NetworksShubhajit Roy, Shrutimoy Das, Binita Maity et al.
Subgraph counting is a fundamental task for analyzing structural patterns in graph-structured data, with important applications in domains such as computational biology and social network analysis, where recurring motifs reveal functional and organizational properties. In this paper, we propose localized versions of the Weisfeiler-Leman (WL) algorithms to improve both expressivity and computational efficiency for this task. We introduce Local $k$-WL, which we prove to be more expressive than $k$-WL and at most as expressive as $(k+1)$-WL, and provide a characterization of patterns whose subgraph and induced subgraph counts are invariant under Local $k$-WL equivalence. To enhance scalability, we present two variants -- Layer $k$-WL and Recursive $k$-WL -- that achieve greater time and space efficiency compared to applying $k$-WL on the entire graph. Additionally, we propose a novel fragmentation technique that decomposes complex subgraphs into simpler subpatterns, enabling the exact count of all induced subgraphs of size at most $4$ using only $1$-WL, with extensions possible for larger patterns when $k>1$. Building on these ideas, we develop a three-stage differentiable learning framework that combines subpattern counts to compute counts of more complex motifs, bridging combinatorial algorithm design with machine learning approaches. We also compare the expressive power of Local $k$-WL with existing GNN hierarchies and demonstrate that, under bounded time complexity, our methods are more expressive than prior approaches.
LGFeb 27, 2021
Statistical Measures For Defining Curriculum Scoring FunctionVinu Sankar Sadasivan, Anirban Dasgupta
Curriculum learning is a training strategy that sorts the training examples by some measure of their difficulty and gradually exposes them to the learner to improve the network performance. Motivated by our insights from implicit curriculum ordering, we first introduce a simple curriculum learning strategy that uses statistical measures such as standard deviation and entropy values to score the difficulty of data points for real image classification tasks. We empirically show its improvements in performance with convolutional and fully-connected neural networks on multiple real image datasets. We also propose and study the performance of a dynamic curriculum learning algorithm. Our dynamic curriculum algorithm tries to reduce the distance between the network weight and an optimal weight at any training step by greedily sampling examples with gradients that are directed towards the optimal weight. Further, we use our algorithms to discuss why curriculum learning is helpful.
DSDec 11, 2020
Online Coresets for Clustering with Bregman DivergencesRachit Chhaya, Jayesh Choudhari, Anirban Dasgupta et al.
We present algorithms that create coresets in an online setting for clustering problems according to a wide subset of Bregman divergences. Notably, our coresets have a small additive error, similar in magnitude to the lightweight coresets Bachem et. al. 2018, and take update time $O(d)$ for every incoming point where $d$ is dimension of the point. Our first algorithm gives online coresets of size $\tilde{O}(\mbox{poly}(k,d,ε,μ))$ for $k$-clusterings according to any $μ$-similar Bregman divergence. We further extend this algorithm to show existence of a non-parametric coresets, where the coreset size is independent of $k$, the number of clusters, for the same subclass of Bregman divergences. Our non-parametric coresets are larger by a factor of $O(\log n)$ ($n$ is number of points) and have similar (small) additive guarantee. At the same time our coresets also function as lightweight coresets for non-parametric versions of the Bregman clustering like DP-Means. While these coresets provide additive error guarantees, they are also significantly smaller (scaling with $O(\log n)$ as opposed to $O(d^d)$ for points in $R^d$) than the (relative-error) coresets obtained in Bachem et. al. 2015 for DP-Means. While our non-parametric coresets are existential, we give an algorithmic version under certain assumptions.
DSOct 6, 2020
On Additive Approximate SubmodularityFlavio Chierichetti, Anirban Dasgupta, Ravi Kumar
A real-valued set function is (additively) approximately submodular if it satisfies the submodularity conditions with an additive error. Approximate submodularity arises in many settings, especially in machine learning, where the function evaluation might not be exact. In this paper we study how close such approximately submodular functions are to truly submodular functions. We show that an approximately submodular function defined on a ground set of $n$ elements is $O(n^2)$ pointwise-close to a submodular function. This result also provides an algorithmic tool that can be used to adapt existing submodular optimization algorithms to approximately submodular functions. To complement, we show an $Ω(\sqrt{n})$ lower bound on the distance to submodularity. These results stand in contrast to the case of approximate modularity, where the distance to modularity is a constant, and approximate convexity, where the distance to convexity is logarithmic.
LGJun 9, 2020
On Coresets For Regularized RegressionRachit Chhaya, Anirban Dasgupta, Supratim Shit
We study the effect of norm based regularization on the size of coresets for regression problems. Specifically, given a matrix $ \mathbf{A} \in {\mathbb{R}}^{n \times d}$ with $n\gg d$ and a vector $\mathbf{b} \in \mathbb{R} ^ n $ and $λ> 0$, we analyze the size of coresets for regularized versions of regression of the form $\|\mathbf{Ax}-\mathbf{b}\|_p^r + λ\|{\mathbf{x}}\|_q^s$ . Prior work has shown that for ridge regression (where $p,q,r,s=2$) we can obtain a coreset that is smaller than the coreset for the unregularized counterpart i.e. least squares regression (Avron et al). We show that when $r \neq s$, no coreset for regularized regression can have size smaller than the optimal coreset of the unregularized version. The well known lasso problem falls under this category and hence does not allow a coreset smaller than the one for least squares regression. We propose a modified version of the lasso problem and obtain for it a coreset of size smaller than the least square regression. We empirically show that the modified version of lasso also induces sparsity in solution, similar to the original lasso. We also obtain smaller coresets for $\ell_p$ regression with $\ell_p$ regularization. We extend our methods to multi response regularized regression. Finally, we empirically demonstrate the coreset performance for the modified lasso and the $\ell_1$ regression with $\ell_1$ regularization.
LGJun 1, 2020
Streaming Coresets for Symmetric Tensor FactorizationRachit Chhaya, Jayesh Choudhari, Anirban Dasgupta et al.
Factorizing tensors has recently become an important optimization module in a number of machine learning pipelines, especially in latent variable models. We show how to do this efficiently in the streaming setting. Given a set of $n$ vectors, each in $\mathbb{R}^d$, we present algorithms to select a sublinear number of these vectors as coreset, while guaranteeing that the CP decomposition of the $p$-moment tensor of the coreset approximates the corresponding decomposition of the $p$-moment tensor computed from the full data. We introduce two novel algorithmic techniques: online filtering and kernelization. Using these two, we present six algorithms that achieve different tradeoffs of coreset size, update time and working space, beating or matching various state of the art algorithms. In the case of matrices ($2$-ordered tensor), our online row sampling algorithm guarantees $(1 \pm ε)$ relative error spectral approximation. We show applications of our algorithms in learning single topic modeling.
LGSep 12, 2018
Discovering Topical Interactions in Text-based Cascades using Hidden Markov Hawkes ProcessesSrikanta Bedathur, Indrajit Bhattacharya, Jayesh Choudhari et al.
Social media conversations unfold based on complex interactions between users, topics and time. While recent models have been proposed to capture network strengths between users, users' topical preferences and temporal patterns between posting and response times, interaction patterns between topics has not been studied. We propose the Hidden Markov Hawkes Process (HMHP) that incorporates topical Markov Chains within Hawkes processes to jointly model topical interactions along with user-user and user-topic patterns. We propose a Gibbs sampling algorithm for HMHP that jointly infers the network strengths, diffusion paths, the topics of the posts as well as the topic-topic interactions. We show using experiments on real and semi-synthetic data that HMHP is able to generalize better and recover the network strengths, topics and diffusion paths more accurately than state-of-the-art baselines. More interestingly, HMHP finds insightful interactions between topics in real tweets which no existing model is able to do.
MLNov 30, 2017
Improved Linear Embeddings via Lagrange DualityKshiteej Sheth, Dinesh Garg, Anirban Dasgupta
Near isometric orthogonal embeddings to lower dimensions are a fundamental tool in data science and machine learning. In this paper, we present the construction of such embeddings that minimizes the maximum distortion for a given set of points. We formulate the problem as a non convex constrained optimization problem. We first construct a primal relaxation and then use the theory of Lagrange duality to create dual relaxation. We also suggest a polynomial time algorithm based on the theory of convex optimization to solve the dual relaxation provably. We provide a theoretical upper bound on the approximation guarantees for our algorithm, which depends only on the spectral properties of the dataset. We experimentally demonstrate the superiority of our algorithm compared to baselines in terms of the scalability and the ability to achieve lower distortion.
CVSep 16, 2015
An Improved Algorithm for Eye Corner DetectionAnirban Dasgupta, Anshit Mandloi, Anjith George et al.
In this paper, a modified algorithm for the detection of nasal and temporal eye corners is presented. The algorithm is a modification of the Santos and Proenka Method. In the first step, we detect the face and the eyes using classifiers based on Haar-like features. We then segment out the sclera, from the detected eye region. From the segmented sclera, we segment out an approximate eyelid contour. Eye corner candidates are obtained using Harris and Stephens corner detector. We introduce a post-pruning of the Eye corner candidates to locate the eye corners, finally. The algorithm has been tested on Yale, JAFFE databases as well as our created database.
CVSep 16, 2015
SPECFACE - A Dataset of Human Faces Wearing SpectaclesAnirban Dasgupta, Shubhobrata Bhattacharya, Aurobinda Routray
This paper presents a database of human faces for persons wearing spectacles. The database consists of images of faces having significant variations with respect to illumination, head pose, skin color, facial expressions and sizes, and nature of spectacles. The database contains data of 60 subjects. This database is expected to be a precious resource for the development and evaluation of algorithms for face detection, eye detection, head tracking, eye gaze tracking, etc., for subjects wearing spectacles. As such, this can be a valuable contribution to the computer vision community.
CVJun 16, 2015
Evaluation of Denoising Techniques for EOG signals based on SNR EstimationAnirban Dasgupta, Suvodip Chakrborty, Aritra Chaudhuri et al.
This paper evaluates four algorithms for denoising raw Electrooculography (EOG) data based on the Signal to Noise Ratio (SNR). The SNR is computed using the eigenvalue method. The filtering algorithms are a) Finite Impulse Response (FIR) bandpass filters, b) Stationary Wavelet Transform, c) Empirical Mode Decomposition (EMD) d) FIR Median Hybrid Filters. An EOG dataset has been prepared where the subject is asked to perform letter cancelation test on 20 subjects.
CVMay 29, 2015
Fast Computation of PERCLOS and Saccadic RatioAnirban Dasgupta, Aurobinda Routray
This thesis describes the development of fast algorithms for the computation of PERcentage CLOSure of eyes (PERCLOS) and Saccadic Ratio (SR). PERCLOS and SR are two ocular parameters reported to be measures of alertness levels in human beings. PERCLOS is the percentage of time in which at least 80% of the eyelid remains closed over the pupil. Saccades are fast and simultaneous movement of both the eyes in the same direction. SR is the ratio of peak saccadic velocity to the saccadic duration. This thesis addresses the issues of image based estimation of PERCLOS and SR, prevailing in the literature such as illumination variation, poor illumination conditions, head rotations etc. In this work, algorithms for real-time PERCLOS computation has been developed and implemented on an embedded platform. The platform has been used as a case study for assessment of loss of attention in automotive drivers. The SR estimation has been carried out offline as real-time implementation requires high frame rates of processing which is difficult to achieve due to hardware limitations. The accuracy in estimation of the loss of attention using PERCLOS and SR has been validated using brain signals, which are reported to be an authentic cue for estimating the state of alertness in human beings. The major contributions of this thesis include database creation, design and implementation of fast algorithms for estimating PERCLOS and SR on embedded computing platforms.
CVMay 15, 2015
A Video Database of Human Faces under Near Infra-Red Illumination for Human Computer Interaction AplicationsS L Happy, Anirban Dasgupta, Anjith George et al.
Human Computer Interaction (HCI) is an evolving area of research for coherent communication between computers and human beings. Some of the important applications of HCI as reported in literature are face detection, face pose estimation, face tracking and eye gaze estimation. Development of algorithms for these applications is an active field of research. However, availability of standard database to validate such algorithms is insufficient. This paper discusses the creation of such a database created under Near Infra-Red (NIR) illumination. NIR illumination has gained its popularity for night mode applications since prolonged exposure to Infra-Red (IR) lighting may lead to many health issues. The database contains NIR videos of 60 subjects in different head orientations and with different facial expressions, facial occlusions and illumination variation. This new database can be a very valuable resource for development and evaluation of algorithms on face detection, eye detection, head tracking, eye gaze tracking etc. in NIR lighting.
CVMay 13, 2015
A Vision Based System for Monitoring the Loss of Attention in Automotive DriversAnirban Dasgupta, Anjith George, S. L. Happy et al.
On board monitoring of the alertness level of an automotive driver has been a challenging research in transportation safety and management. In this paper, we propose a robust real time embedded platform to monitor the loss of attention of the driver during day as well as night driving conditions. The PERcentage of eye CLOSure (PERCLOS) has been used as the indicator of the alertness level. In this approach, the face is detected using Haar like features and tracked using a Kalman Filter. The Eyes are detected using Principal Component Analysis (PCA) during day time and the block Local Binary Pattern (LBP) features during night. Finally the eye state is classified as open or closed using Support Vector Machines(SVM). In plane and off plane rotations of the drivers face have been compensated using Affine and Perspective Transformation respectively. Compensation in illumination variation is carried out using Bi Histogram Equalization (BHE). The algorithm has been cross validated using brain signals and finally been implemented on a Single Board Computer (SBC) having Intel Atom processor, 1 GB RAM, 1.66 GHz clock, x86 architecture, Windows Embedded XP operating system. The system is found to be robust under actual driving conditions.
CVMay 13, 2015
A Framework for Fast Face and Eye DetectionAnjith George, Anirban Dasgupta, Aurobinda Routray
Face detection is an essential step in many computer vision applications like surveillance, tracking, medical analysis, facial expression analysis etc. Several approaches have been made in the direction of face detection. Among them, Haar-like features based method is a robust method. In spite of the robustness, Haar - like features work with some limitations. However, with some simple modifications in the algorithm, its performance can be made faster and more robust. The present work refers to the increase in speed of operation of the original algorithm by down sampling the frames and its analysis with different scale factors. It also discusses the detection of tilted faces using an affine transformation of the input image.