Sanjay Seetharaman

LG
4papers
12citations
Novelty51%
AI Score41

4 Papers

CGMar 25
Line Cover and Related Problems

Matthias Bentert, Fedor v. Fomin, Petr A. Golovach et al.

We study extensions of the classic \emph{Line Cover} problem, which asks whether a set of $n$ points in the plane can be covered using $k$ lines. Line Cover is known to be NP-hard, and we focus on two natural generalizations. The first is \textbf{Line Clustering}, where the goal is to find $k$ lines minimizing the sum of squared distances from the input points to their nearest line. The second is \textbf{Hyperplane Cover}, which asks whether $n$ points in $\mathbb{R}^d$ can be covered by $k$ hyperplanes. We also study the more general \textbf{Projective Clustering} problem, which unifies both settings and has applications in machine learning, data analysis, and computational geometry. In this problem, one seeks $k$ affine subspaces of dimension $r$ that minimize the sum of squared distances from the given points in $\mathbb{R}^d$ to the nearest subspace. Our results reveal notable differences in the parameterized complexity of these problems. While Line Cover is fixed-parameter tractable when parameterized by $k$, we show that Line Clustering is W[1]-hard with respect to $k$ and does not admit an algorithm with running time $n^{o(k)}$ unless the Exponential Time Hypothesis fails. Hyperplane Cover has been known to be NP-hard since the 1980s, following work of Megiddo and Tamir, even for $d=2$, we show that it remains NP-hard even when $k=2$. Finally, we present an algorithm for Projective Clustering running in $n^{O(dk(r+1))}$ time. This bound matches our lower bound for Line Clustering and generalizes the classic algorithm for $k$-Means Clustering ($r=0$) by Inaba, Katoh, and Imai [SoCG 1994].

DSApr 6
Dominating Set with Quotas: Balancing Coverage and Constraints

Sobyasachi Chatterjee, Sushmita Gupta, Saket Saurabh et al.

We study a natural generalization of the classical \textsc{Dominating Set} problem, called \textsc{Dominating Set with Quotas} (DSQ). In this problem, we are given a graph \( G \), an integer \( k \), and for each vertex \( v \in V(G) \), a lower quota \( \mathrm{lo}_v \) and an upper quota \( \mathrm{up}_v \). The goal is to determine whether there exists a set \( S \subseteq V(G) \) of size at most \( k \) such that for every vertex \( v \in V(G) \), the number of vertices in its closed neighborhood that belong to \( S \), i.e., \( |N[v] \cap S| \), lies within the range \( [\mathrm{lo}_v, \mathrm{up}_v] \). This richer model captures a variety of practical settings where both under- and over-coverage must be avoided -- such as in fault-tolerant infrastructure, load-balanced facility placement, or constrained communication networks. While DS is already known to be computationally hard, we show that the added expressiveness of per-vertex quotas in DSQ introduces additional algorithmic challenges. In particular, we prove that DSQ becomes \W[1]-hard even on structurally sparse graphs -- such as those with degeneracy 2, or excluding \( K_{3,3} \) as a subgraph -- despite these classes admitting FPT algorithms for DS. On the positive side, we show that DSQ is fixed-parameter tractable when parameterized by solution size and treewidth, and more generally, on nowhere dense graph classes. Furthermore, we design a subexponential-time algorithm for DSQ on apex-minor-free graphs using the bidimensionality framework. These results collectively offer a refined view of the algorithmic landscape of DSQ, revealing a sharp contrast with the classical DS problem and identifying the key structural properties that govern tractability.

LGMay 31, 2021
Gradient-based Data Subversion Attack Against Binary Classifiers

Rosni K Vasu, Sanjay Seetharaman, Shubham Malaviya et al.

Machine learning based data-driven technologies have shown impressive performances in a variety of application domains. Most enterprises use data from multiple sources to provide quality applications. The reliability of the external data sources raises concerns for the security of the machine learning techniques adopted. An attacker can tamper the training or test datasets to subvert the predictions of models generated by these techniques. Data poisoning is one such attack wherein the attacker tries to degrade the performance of a classifier by manipulating the training data. In this work, we focus on label contamination attack in which an attacker poisons the labels of data to compromise the functionality of the system. We develop Gradient-based Data Subversion strategies to achieve model degradation under the assumption that the attacker has limited-knowledge of the victim model. We exploit the gradients of a differentiable convex loss function (residual errors) with respect to the predicted label as a warm-start and formulate different strategies to find a set of data instances to contaminate. Further, we analyze the transferability of attacks and the susceptibility of binary classifiers. Our experiments show that the proposed approach outperforms the baselines and is computationally efficient.

LGApr 24, 2021
Influence Based Defense Against Data Poisoning Attacks in Online Learning

Sanjay Seetharaman, Shubham Malaviya, Rosni KV et al.

Data poisoning is a type of adversarial attack on training data where an attacker manipulates a fraction of data to degrade the performance of machine learning model. Therefore, applications that rely on external data-sources for training data are at a significantly higher risk. There are several known defensive mechanisms that can help in mitigating the threat from such attacks. For example, data sanitization is a popular defensive mechanism wherein the learner rejects those data points that are sufficiently far from the set of training instances. Prior work on data poisoning defense primarily focused on offline setting, wherein all the data is assumed to be available for analysis. Defensive measures for online learning, where data points arrive sequentially, have not garnered similar interest. In this work, we propose a defense mechanism to minimize the degradation caused by the poisoned training data on a learner's model in an online setup. Our proposed method utilizes an influence function which is a classic technique in robust statistics. Further, we supplement it with the existing data sanitization methods for filtering out some of the poisoned data points. We study the effectiveness of our defense mechanism on multiple datasets and across multiple attack strategies against an online learner.