LGApr 13Code
UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular GuaranteesPrateek Chanda, Prayas Agrawal, Karthik S. Gurumoorthy et al.
Selecting prototypical examples from a source distribution to represent a target data distribution is a fundamental problem in machine learning. Existing subset selection methods often rely on implicit importance scores, which can be skewed towards majority classes and lead to low-quality prototypes for minority classes. We present $\methodprop$, a novel subset selection framework that minimizes the optimal transport (OT) distance between a uniformly weighted prototypical distribution and the target distribution. While intuitive, this formulation leads to a cardinality-constrained maximization of a \emph{super-additive} objective, which is generally intractable to approximate efficiently. To address this, we propose a principled reformulation of the OT marginal constraints, yielding a partial optimal transport-based submodular objective. We prove that this reformulation enables a greedy algorithm with a $(1-1/e)$ approximation guarantee relative to the original super-additive maximization problem. Empirically, we showcase that enforcing uniform prototype weights in UniPROT consistently improves minority-class representation in imbalanced classification benchmarks without compromising majority-class accuracy. In both finetuning and pretraining regimes for large language models under domain imbalance, UniPROT enforces uniform source contributions, yielding robust performance gains. Our results establish UniPROT as a scalable, theoretically grounded solution for uniform-weighted prototype selection. Our code is publicly available at GitHub\footnote{Code: https://github.com/efficiency-learning/UniPROT}
CVFeb 8, 2015
A new variational principle for the Euclidean distance function: Linear approach to the non-linear eikonal problemKarthik S. Gurumoorthy, Anand Rangarajan
We present a fast convolution-based technique for computing an approximate, signed Euclidean distance function $S$ on a set of 2D and 3D grid locations. Instead of solving the non-linear, static Hamilton-Jacobi equation ($\|\nabla S\|=1$), our solution stems from first solving for a scalar field $ϕ$ in a linear differential equation and then deriving the solution for $S$ by taking the negative logarithm. In other words, when $S$ and $ϕ$ are related by $ϕ= \exp \left(-\frac{S}τ \right)$ and $ϕ$ satisfies a specific linear differential equation corresponding to the extremum of a variational problem, we obtain the approximate Euclidean distance function $S = -τ\log(ϕ)$ which converges to the true solution in the limit as $τ\rightarrow 0$. This is in sharp contrast to techniques like the fast marching and fast sweeping methods which directly solve the Hamilton-Jacobi equation by the Godunov upwind discretization scheme. Our linear formulation results in a closed-form solution to the approximate Euclidean distance function expressible as a discrete convolution, and hence efficiently computable using the fast Fourier transform (FFT). Our solution also circumvents the need for spatial discretization of the derivative operator. As $τ\rightarrow0$ we show the convergence of our results to the true solution and also bound the error for a given value of $τ$. The differentiability of our solution allows us to compute---using a set of convolutions---the first and second derivatives of the approximate distance function. In order to determine the sign of the distance function (defined to be positive inside a closed region and negative outside), we compute the winding number in 2D and the topological degree in 3D, whose computations can also be performed via fast convolutions. We demonstrate the efficacy of our method through a set of experimental results.
CVApr 13, 2023
Signal Reconstruction from Samples at Unknown Locations with Application to 2D Unknown View TomographySheel Shah, Kaishva Shah, Karthik S. Gurumoorthy et al.
It is well known that a band-limited signal can be reconstructed from its uniformly spaced samples if the sampling rate is sufficiently high. More recently, it has been proved that one can reconstruct a 1D band-limited signal even if the exact sample locations are unknown, but given a uniform distribution of the sample locations and their ordering in 1D. In this work, we extend the analytical error bounds in such scenarios for quasi-bandlimited (QBL) signals, and for the case of arbitrary but known sampling distributions. We also prove that such reconstruction methods are resilient to a certain proportion of errors in the specification of the sample location ordering. We then express the problem of tomographic reconstruction of 2D images from 1D Radon projections under unknown angles (2D UVT) with known angle distribution, as a special case for reconstruction of QBL signals from samples at unknown locations with known distribution. Building upon our theoretical background, we present asymptotic bounds for 2D QBL image reconstruction from 1D Radon projections in the unknown angles setting, and present an extensive set of simulations to verify these bounds in varied parameter regimes. To the best of our knowledge, this is the first piece of work to perform such an analysis for 2D UVT and explicitly relate it to advances in sampling theory, even though the associated reconstruction algorithms have been known for a long time.
LGApr 18, 2023
Cooperative Multi-Agent Reinforcement Learning for Inventory ManagementMadhav Khirwar, Karthik S. Gurumoorthy, Ankit Ajit Jain et al.
With Reinforcement Learning (RL) for inventory management (IM) being a nascent field of research, approaches tend to be limited to simple, linear environments with implementations that are minor modifications of off-the-shelf RL algorithms. Scaling these simplistic environments to a real-world supply chain comes with a few challenges such as: minimizing the computational requirements of the environment, specifying agent configurations that are representative of dynamics at real world stores and warehouses, and specifying a reward framework that encourages desirable behavior across the whole supply chain. In this work, we present a system with a custom GPU-parallelized environment that consists of one warehouse and multiple stores, a novel architecture for agent-environment dynamics incorporating enhanced state and action spaces, and a shared reward specification that seeks to optimize for a large retailer's supply chain needs. Each vertex in the supply chain graph is an independent agent that, based on its own inventory, able to place replenishment orders to the vertex upstream. The warehouse agent, aside from placing orders from the supplier, has the special property of also being able to constrain replenishment to stores downstream, which results in it learning an additional allocation sub-policy. We achieve a system that outperforms standard inventory control policies such as a base-stock policy and other RL-based specifications for 1 product, and lay out a future direction of work for multiple products.
LGMar 3, 2022
Joint Probability Estimation Using Tensor Decomposition and DictionariesShaan ul Haque, Ajit Rajwade, Karthik S. Gurumoorthy
In this work, we study non-parametric estimation of joint probabilities of a given set of discrete and continuous random variables from their (empirically estimated) 2D marginals, under the assumption that the joint probability could be decomposed and approximated by a mixture of product densities/mass functions. The problem of estimating the joint probability density function (PDF) using semi-parametric techniques such as Gaussian Mixture Models (GMMs) is widely studied. However such techniques yield poor results when the underlying densities are mixtures of various other families of distributions such as Laplacian or generalized Gaussian, uniform, Cauchy, etc. Further, GMMs are not the best choice to estimate joint distributions which are hybrid in nature, i.e., some random variables are discrete while others are continuous. We present a novel approach for estimating the PDF using ideas from dictionary representations in signal processing coupled with low rank tensor decompositions. To the best our knowledge, this is the first work on estimating joint PDFs employing dictionaries alongside tensor decompositions. We create a dictionary of various families of distributions by inspecting the data, and use it to approximate each decomposed factor of the product in the mixture. Our approach can naturally handle hybrid $N$-dimensional distributions. We test our approach on a variety of synthetic and real datasets to demonstrate its effectiveness in terms of better classification rates and lower error rates, when compared to state of the art estimators.
MLApr 18, 2023
Estimating Joint Probability Distribution With Low-Rank Tensor Decomposition, Radon Transforms and DictionariesPranava Singhal, Waqar Mirza, Ajit Rajwade et al.
In this paper, we describe a method for estimating the joint probability density from data samples by assuming that the underlying distribution can be decomposed as a mixture of product densities with few mixture components. Prior works have used such a decomposition to estimate the joint density from lower-dimensional marginals, which can be estimated more reliably with the same number of samples. We combine two key ideas: dictionaries to represent 1-D densities, and random projections to estimate the joint distribution from 1-D marginals, explored separately in prior work. Our algorithm benefits from improved sample complexity over the previous dictionary-based approach by using 1-D marginals for reconstruction. We evaluate the performance of our method on estimating synthetic probability densities and compare it with the previous dictionary-based approach and Gaussian Mixture Models (GMMs). Our algorithm outperforms these other approaches in all the experimental settings.
CVJul 20, 2025
3-Dimensional CryoEM Pose Estimation and Shift Correction PipelineKaishva Chintan Shah, Virajith Boddapati, Karthik S. Gurumoorthy et al.
Accurate pose estimation and shift correction are key challenges in cryo-EM due to the very low SNR, which directly impacts the fidelity of 3D reconstructions. We present an approach for pose estimation in cryo-EM that leverages multi-dimensional scaling (MDS) techniques in a robust manner to estimate the 3D rotation matrix of each particle from pairs of dihedral angles. We express the rotation matrix in the form of an axis of rotation and a unit vector in the plane perpendicular to the axis. The technique leverages the concept of common lines in 3D reconstruction from projections. However, common line estimation is ridden with large errors due to the very low SNR of cryo-EM projection images. To address this challenge, we introduce two complementary components: (i) a robust joint optimization framework for pose estimation based on an $\ell_1$-norm objective or a similar robust norm, which simultaneously estimates rotation axes and in-plane vectors while exactly enforcing unit norm and orthogonality constraints via projected coordinate descent; and (ii) an iterative shift correction algorithm that estimates consistent in-plane translations through a global least-squares formulation. While prior approaches have leveraged such embeddings and common-line geometry for orientation recovery, existing formulations typically rely on $\ell_2$-based objectives that are sensitive to noise, and enforce geometric constraints only approximately. These choices, combined with a sequential pipeline structure, can lead to compounding errors and suboptimal reconstructions in low-SNR regimes. Our pipeline consistently outperforms prior methods in both Euler angle accuracy and reconstruction fidelity, as measured by the Fourier Shell Correlation (FSC).
CVJan 6, 2025
Two-Dimensional Unknown View Tomography from Unknown Angle DistributionsKaishva Chintan Shah, Karthik S. Gurumoorthy, Ajit Rajwade
This study presents a technique for 2D tomography under unknown viewing angles when the distribution of the viewing angles is also unknown. Unknown view tomography (UVT) is a problem encountered in cryo-electron microscopy and in the geometric calibration of CT systems. There exists a moderate-sized literature on the 2D UVT problem, but most existing 2D UVT algorithms assume knowledge of the angle distribution which is not available usually. Our proposed methodology formulates the problem as an optimization task based on cross-validation error, to estimate the angle distribution jointly with the underlying 2D structure in an alternating fashion. We explore the algorithm's capabilities for the case of two probability distribution models: a semi-parametric mixture of von Mises densities and a probability mass function model. We evaluate our algorithm's performance under noisy projections using a PCA-based denoising technique and Graph Laplacian Tomography (GLT) driven by order statistics of the estimated distribution, to ensure near-perfect ordering, and compare our algorithm to intuitive baselines.
LGJun 7, 2024
Submodular Framework for Structured-Sparse Optimal TransportPiyushi Manupriya, Pratik Jawanpuria, Karthik S. Gurumoorthy et al.
Unbalanced optimal transport (UOT) has recently gained much attention due to its flexible framework for handling un-normalized measures and its robustness properties. In this work, we explore learning (structured) sparse transport plans in the UOT setting, i.e., transport plans have an upper bound on the number of non-sparse entries in each column (structured sparse pattern) or in the whole plan (general sparse pattern). We propose novel sparsity-constrained UOT formulations building on the recently explored maximum mean discrepancy based UOT. We show that the proposed optimization problem is equivalent to the maximization of a weakly submodular function over a uniform matroid or a partition matroid. We develop efficient gradient-based discrete greedy algorithms and provide the corresponding theoretical guarantees. Empirically, we observe that our proposed greedy algorithms select a diverse support set and we illustrate the efficacy of the proposed approach in various applications.
LGFeb 9, 2022
A decision-tree framework to select optimal box-sizes for product shipmentsKarthik S. Gurumoorthy, Abhiraj Hinge
In package-handling facilities, boxes of varying sizes are used to ship products. Improperly sized boxes with box dimensions much larger than the product dimensions create wastage and unduly increase the shipping costs. Since it is infeasible to make unique, tailor-made boxes for each of the $N$ products, the fundamental question that confronts e-commerce companies is: How many $K << N$ cuboidal boxes need to manufactured and what should be their dimensions? In this paper, we propose a solution for the single-count shipment containing one product per box in two steps: (i) reduce it to a clustering problem in the $3$ dimensional space of length, width and height where each cluster corresponds to the group of products that will be shipped in a particular size variant, and (ii) present an efficient forward-backward decision tree based clustering method with low computational complexity on $N$ and $K$ to obtain these $K$ clusters and corresponding box dimensions. Our algorithm has multiple constituent parts, each specifically designed to achieve a high-quality clustering solution. As our method generates clusters in an incremental fashion without discarding the present solution, adding or deleting a size variant is as simple as stopping the backward pass early or executing it for one more iteration. We tested the efficacy of our approach by simulating actual single-count shipments that were transported during a month by Amazon using the proposed box dimensions. Even by just modifying the existing box dimensions and not adding a new size variant, we achieved a reduction of $4.4\%$ in the shipment volume, contributing to the decrease in non-utilized, air volume space by $2.2\%$. The reduction in shipment volume and air volume improved significantly to $10.3\%$ and $6.1\%$ when we introduced $4$ additional boxes.
LGJan 21, 2022
Individual Treatment Effect Estimation Through Controlled Neural Network Training in Two StagesNaveen Nair, Karthik S. Gurumoorthy, Dinesh Mandalapu
We develop a Causal-Deep Neural Network (CDNN) model trained in two stages to infer causal impact estimates at an individual unit level. Using only the pre-treatment features in stage 1 in the absence of any treatment information, we learn an encoding for the covariates that best represents the outcome. In the $2^{nd}$ stage we further seek to predict the unexplained outcome from stage 1, by introducing the treatment indicator variables alongside the encoded covariates. We prove that even without explicitly computing the treatment residual, our method still satisfies the desirable local Neyman orthogonality, making it robust to small perturbations in the nuisance parameters. Furthermore, by establishing connections with the representation learning approaches, we create a framework from which multiple variants of our algorithm can be derived. We perform initial experiments on the publicly available data sets to compare these variants and get guidance in selecting the best variant of our CDNN method. On evaluating CDNN against the state-of-the-art approaches on three benchmarking datasets, we observe that CDNN is highly competitive and often yields the most accurate individual treatment effect estimates. We highlight the strong merits of CDNN in terms of its extensibility to multiple use cases.
LGMar 22, 2021
Recovery of Joint Probability Distribution from one-way marginals: Low rank Tensors and Random ProjectionsJian Vora, Karthik S. Gurumoorthy, Ajit Rajwade
Joint probability mass function (PMF) estimation is a fundamental machine learning problem. The number of free parameters scales exponentially with respect to the number of random variables. Hence, most work on nonparametric PMF estimation is based on some structural assumptions such as clique factorization adopted by probabilistic graphical models, imposition of low rank on the joint probability tensor and reconstruction from 3-way or 2-way marginals, etc. In the present work, we link random projections of data to the problem of PMF estimation using ideas from tomography. We integrate this idea with the idea of low-rank tensor decomposition to show that we can estimate the joint density from just one-way marginals in a transformed space. We provide a novel algorithm for recovering factors of the tensor from one-way marginals, test it across a variety of synthetic and real-world datasets, and also perform MAP inference on the estimated model for classification.
LGMar 18, 2021
SPOT: A framework for selection of prototypes using optimal transportKarthik S. Gurumoorthy, Pratik Jawanpuria, Bamdev Mishra
In this work, we develop an optimal transport (OT) based framework to select informative prototypical examples that best represent a given target dataset. Summarizing a given target dataset via representative examples is an important problem in several machine learning applications where human understanding of the learning models and underlying data distribution is essential for decision making. We model the prototype selection problem as learning a sparse (empirical) probability distribution having the minimum OT distance from the target distribution. The learned probability measure supported on the chosen prototypes directly corresponds to their importance in representing the target data. We show that our objective function enjoys a key property of submodularity and propose an efficient greedy method that is both computationally fast and possess deterministic approximation guarantees. Empirical results on several real world benchmarks illustrate the efficacy of our approach.
LGJun 5, 2020
Think out of the package: Recommending package types for e-commerce shipmentsKarthik S. Gurumoorthy, Subhajit Sanyal, Vineet Chaoji
Multiple product attributes like dimensions, weight, fragility, liquid content etc. determine the package type used by e-commerce companies to ship products. Sub-optimal package types lead to damaged shipments, incurring huge damage related costs and adversely impacting the company's reputation for safe delivery. Items can be shipped in more protective packages to reduce damage costs, however this increases the shipment costs due to expensive packaging and higher transportation costs. In this work, we propose a multi-stage approach that trades-off between shipment and damage costs for each product, and accurately assigns the optimal package type using a scalable, computationally efficient linear time algorithm. A simple binary search algorithm is presented to find the hyper-parameter that balances between the shipment and damage costs. Our approach when applied to choosing package type for Amazon shipments, leads to significant cost savings of tens of millions of dollars in emerging marketplaces, by decreasing both the overall shipment cost and the number of in-transit damages. Our algorithm is live and deployed in the production system where, package types for more than 130,000 products have been modified based on the model's recommendation, realizing a reduction in damage rate of 24%.
LGJul 21, 2018
Streaming Methods for Restricted Strongly Convex Functions with Applications to Prototype SelectionKarthik S. Gurumoorthy, Amit Dhurandhar
In this paper, we show that if the optimization function is restricted-strongly-convex (RSC) and restricted-smooth (RSM) -- a rich subclass of weakly submodular functions -- then a streaming algorithm with constant factor approximation guarantee is possible. More generally, our results are applicable to any monotone weakly submodular function with submodularity ratio bounded from above. This (positive) result which provides a sufficient condition for having a constant factor streaming guarantee for weakly submodular functions may be of special interest given the recent negative result (Elenberg et al., 2017) for the general class of weakly submodular functions. We apply our streaming algorithms for creating compact synopsis of large complex datasets, by selecting $m$ representative elements, by optimizing a suitable RSC and RSM objective function. Above results hold even with additional constraints such as learning non-negative weights, for interpretability, for each selected element indicative of its importance. We empirically evaluate our algorithms on two real datasets: MNIST- a handwritten digits dataset and Letters- a UCI dataset containing the alphabet written in different fonts and styles. We observe that our algorithms are orders of magnitude faster than the state-of-the-art streaming algorithm for weakly submodular functions and with our main algorithm still providing equally good solutions in practice.
MLJul 5, 2017
Efficient Data Representation by Selecting Prototypes with Importance WeightsKarthik S. Gurumoorthy, Amit Dhurandhar, Guillermo Cecchi et al.
Prototypical examples that best summarizes and compactly represents an underlying complex data distribution communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an effective data mining method.
NCFeb 28, 2015
Sensitivity Analysis for additive STDP ruleSubhajit Sengupta, Karthik S. Gurumoorthy, Arunava Banerjee
Spike Timing Dependent Plasticity (STDP) is a Hebbian like synaptic learning rule. The basis of STDP has strong experimental evidences and it depends on precise input and output spike timings. In this paper we show that under biologically plausible spiking regime, slight variability in the spike timing leads to drastically different evolution of synaptic weights when its dynamics are governed by the additive STDP rule.
NAMar 8, 2014
A fast eikonal equation solver using the Schrodinger wave equationKarthik S. Gurumoorthy, Adrian M. Peter, Birmingham Hang Guan et al.
We use a Schrödinger wave equation formalism to solve the eikonal equation. In our framework, a solution to the eikonal equation is obtained in the limit as Planck's constant $\hbar$ (treated as a free parameter) tends to zero of the solution to the corresponding linear Schrödinger equation. The Schrödinger equation corresponding to the eikonal turns out to be a \emph{generalized, screened Poisson equation}. Despite being linear, it does not have a closed-form solution for arbitrary forcing functions. We present two different techniques to solve the screened Poisson equation. In the first approach we use a standard perturbation analysis approach to derive a new algorithm which is guaranteed to converge provided the forcing function is bounded and positive. The perturbation technique requires a sequence of discrete convolutions which can be performed in $O(N\log N)$ using the Fast Fourier Transform (FFT) where $N$ is the number of grid points. In the second method we discretize the linear Laplacian operator by the finite difference method leading to a sparse linear system of equations which can be solved using the plethora of sparse solvers. The eikonal solution is recovered from the exponent of the resultant scalar field. Our approach eliminates the need to explicitly construct viscosity solutions as customary with direct solutions to the eikonal. Since the linear equation is computed for a small but non-zero $\hbar$, the obtained solution is an approximation. Though our solution framework is applicable to the general class of eikonal problems, we detail specifics for the popular vision applications of shape-from-shading, vessel segmentation, and path planning.