Petr A. Golovach

DS
12papers
108citations
Novelty55%
AI Score54

12 Papers

91.4DSApr 30
Finding irrelevant vertices in linear time on bounded-genus graphs

Petr A. Golovach, Stavros G. Kolliopoulos, Giannos Stamoulis et al.

The irrelevant vertex technique provides a powerful tool for the design of parameterized algorithms for a wide variety of problems on graphs. A common characteristic of these problems, permitting the application of this technique on surface-embedded graphs, is the fact that every graph of large enough treewidth contains a vertex that is irrelevant, in the sense that its removal yields an equivalent instance of the problem. The straightforward application of this technique yields algorithms with running time that is quadratic in the size of the input graph. This running time is due to the fact that it takes linear time to detect one irrelevant vertex and the total number of irrelevant vertices to be detected is linear as well. Using advanced techniques, sub-quadratic algorithms have been designed for particular problems, even in general graphs. However, designing a general framework for linear-time algorithms has been open, even for the bounded-genus case. In this paper we introduce a general framework that enables finding in linear time an entire set of irrelevant vertices whose removal yields a bounded-treewidth graph, provided that the input graph has bounded genus. Our technique consists of decomposing any surface-embedded graph into a tree-structured collection of bounded-treewidth subgraphs where detecting globally irrelevant vertices can be done locally and independently. Our method is applicable to a wide variety of known graph containment or graph modification problems where the irrelevant vertex technique applies. Examples include the (Induced) Minor Folio problem, the (Induced) Disjoint Paths problem, and the $\mathcal{F}$-Minor-Deletion problem.

87.6DSMar 19
Algorithms for Euclidean Distance Matrix Completion: Exploiting Proximity to Triviality

Fedor V. Fomin, Petr A. Golovach, M. S. Ramanujan et al.

In the d-Euclidean Distance Matrix Completion (d-EDMC) problem, one aims to determine whether a given partial matrix of pairwise distances can be extended to a full Euclidean distance matrix in d dimensions. This problem is a cornerstone of computational geometry with numerous applications. While classical work on this problem often focuses on exploiting connections to semidefinite programming typically leading to approximation algorithms, we focus on exact algorithms and propose a novel distance-from-triviality parameterization framework to obtain tractability results for d-EDMC. We identify key structural patterns in the input that capture entry density, including chordal substructures and coverability of specified entries by fully specified principal submatrices. We obtain: (1) The first fixed-parameter algorithm (FPT algorithm) for d-EDMC parameterized by d and the maximum number of unspecified entries per row/column. This is achieved through a novel compression algorithm that reduces a given instance to a submatrix on O(1) rows (for fixed values of the parameters). (2) The first FPT algorithm for d-EDMC parameterized by d and the minimum number of fully specified principal submatrices whose entries cover all specified entries of the given matrix. This result is also achieved through a compression algorithm. (3) A polynomial-time algorithm for d-EDMC when both d and the minimum fill-in of a natural graph representing the specified entries are fixed constants. This result is achieved by combining tools from distance geometry and algorithms from real algebraic geometry. Our work identifies interesting parallels between EDM completion and graph problems, with our algorithms exploiting techniques from both domains.

20.1CGMar 25
Line Cover and Related Problems

Matthias Bentert, Fedor v. Fomin, Petr A. Golovach et al.

We study extensions of the classic \emph{Line Cover} problem, which asks whether a set of $n$ points in the plane can be covered using $k$ lines. Line Cover is known to be NP-hard, and we focus on two natural generalizations. The first is \textbf{Line Clustering}, where the goal is to find $k$ lines minimizing the sum of squared distances from the input points to their nearest line. The second is \textbf{Hyperplane Cover}, which asks whether $n$ points in $\mathbb{R}^d$ can be covered by $k$ hyperplanes. We also study the more general \textbf{Projective Clustering} problem, which unifies both settings and has applications in machine learning, data analysis, and computational geometry. In this problem, one seeks $k$ affine subspaces of dimension $r$ that minimize the sum of squared distances from the given points in $\mathbb{R}^d$ to the nearest subspace. Our results reveal notable differences in the parameterized complexity of these problems. While Line Cover is fixed-parameter tractable when parameterized by $k$, we show that Line Clustering is W[1]-hard with respect to $k$ and does not admit an algorithm with running time $n^{o(k)}$ unless the Exponential Time Hypothesis fails. Hyperplane Cover has been known to be NP-hard since the 1980s, following work of Megiddo and Tamir, even for $d=2$, we show that it remains NP-hard even when $k=2$. Finally, we present an algorithm for Projective Clustering running in $n^{O(dk(r+1))}$ time. This bound matches our lower bound for Line Clustering and generalizes the classic algorithm for $k$-Means Clustering ($r=0$) by Inaba, Katoh, and Imai [SoCG 1994].

75.0DSApr 28
Clustering Permutations under the Ulam Metric: A Parameterized Complexity Study

Tian Bai, Fedor V. Fomin, Petr A. Golovach et al.

Rank aggregation seeks a representative permutation for a collection of rankings and plays a central role in areas such as social choice, information retrieval, and computational biology. Two fundamental aggregation tasks are the center and median problems, which minimize the maximum and the total distance to the input permutations, respectively. While these problems are well understood under Kendall's tau and related distances, their parameterized complexity under the Ulam metric, an edit-distance-based metric on permutations, has remained largely unexplored. In this work, we initiate a systematic study of the parameterized complexity of rank aggregation under the Ulam metric. We consider both the center and median problems, as well as their generalizations to the $k$-center and $k$-median clustering settings, parameterized by the number of centers $k$ and the distance budget $d$ (corresponding to the maximum distance for center variants and the total distance for median variants). Both problems are known to be NP-hard already for $k=1$. We show that the Ulam $k$-center problem remains NP-hard when $d=1$, but is fixed-parameter tractable when parameterized by $k + d$. Our algorithm is based on a novel local-search framework tailored to the non-local nature of Ulam distances. We complement this by proving that no polynomial kernel exists for the $k+d$ parameterization unless NP $\subseteq$ coNP/poly. For the Ulam $k$-median problem parameterized by the total distance $d$, we establish W[1]-hardness and provide an XP algorithm. We also provide a polynomial kernel for the parameter $k + d$, which in turn yields a fixed-parameter tractable algorithm.

64.6DSApr 27
Polynomial Kernels for Spanning Tree with Diversity Requirements

Petr A. Golovach, Diptapriyo Majumdar, Saket Saurabh

Given a connected undirected graph $G$, a spanning tree is a subgraph $T$ of $G$ such that $V(T) = V(G)$ and $T$ is a tree. A collection of $\ell$ spanning trees $T_1,\ldots,T_\ell$ is pairwise $k$-diverse if for every $i \neq j$, $|E(T_i) \triangle E(T_j)| \geq k$. Given a connected undirected graph $G$ and integers $p, q, k, \ell$, Leaf & Internal-Constrained Diverse Spanning Trees asks whether there are $\ell$ distinct spanning trees $T_1,\ldots,T_{\ell}$ of $G$ that are pairwise $k$-diverse such that each tree has at least $p$ leaves and at least $q$ internal vertices. Similarly, Leaf & Non-terminal-Constrained Diverse Spanning Trees takes a connected undirected graph $G$, $V_{NT}\subseteq V(G)$, and three integers $p, k, \ell$, and asks if $G$ has $\ell$ spanning trees that are pairwise $k$-diverse, and each has at least $p$ leaves and conains the vertices of $V_{NT}$ as internal. We consider these two problems from the kernelization perspective and provide polynomial kernels for Leaf & Internal-Constrained Diverse Spanning Trees and Leaf & Non-terminal-Constrained Diverse Spanning Trees, when parameterized by $p + q + k + \ell$ and $p + |V_{\rm NT}| + k + \ell$, respectively.

53.4DSApr 27
Identification to Subclasses of Chordal Graphs

Petr A. Golovach, Laure Morelle, Daniël Paulusma

An identification of two vertices $u$ and $v$ in a graph replaces them with a new vertex whose neighborhood is the union of the neighborhoods of $u$ and $v$. We study the {\sc ${\cal H}$-Identification} problem, which is to decide whether a given graph $G$ can be transformed (``identified'') to a graph in ${\cal H}$ by applying at most $k$ vertex identifications. We determine the classical and parameterized complexity of this problem for various subclasses ${\cal H}$ of chordal graphs, obtaining an almost complete picture for two parameters: $k$ and $n-k$. We also consider the {\sc Identification} problem, which is to test for two given graphs $G$ and $H$ if $G$ can be identified to $H$. We determine the parameterized complexity of this problem when $H$ is a graph from one of our testbed classes, taking the number of simplicial vertices of $H$ as the parameter.

68.2DSApr 1
A Framework for Parameterized Subexponential-Subcubic-Time Algorithms for Weighted Problems in Planar Graphs

Matthias Bentert, Fedor V. Fomin, Petr A. Golovach

Many problems are known to be solvable in subexponential parameterized time when the input graph is planar. The bidimensionality framework of Demaine, Fomin, Hajiaghay, and Thilikos [JACM'05] and the treewidth-pattern-covering approach by Fomin, Lokshtanov, Marx, Pilipczuk, Pilipczuk, and Saurabh [SICOMP'22] give robust tools for designing such algorithms. However, there are still many problems for which we do not know whether subexponential parameterized algorithms exist. The bidimensionality framework is not able to handle weights or directed graphs and the treewidth-pattern-covering approach only works for finding connected solutions. Building on a result by Nederlof [STOC'20], we provide a framework that is able to solve a variety of problems in planar graphs in subexponential parameterized time for which this was previously not known (where the polynomial part of the running time is usually $O(n^{2.49})$). Our framework can handle weights, does not require solutions to contain only few connected components, and applies to cases where the number of potential patterns of a solution is exponential in the parameter. We then use the framework to show that various weighted problems like Weighted Partial Vertex Cover, Maximum-Weight Induced Forest, Minimum-Weight Rooted Simple Minor, and Maximum-Weight Rooted Parallel Induced Minor allow for subexponential parameterized algorithms. This was previously not known for any of them. Moreover, we present a very easy-to-use fragment of our framework. This fragment allows for significantly simpler proofs in the case of Maximum-Weight Independent Set and Maximum $(k, n-k)$-Cut and is able to show a subexponential parameterized algorithm for weighted versions of Densest $k$-Subgraph. Even the unweighted version was not known before and is stated as an open problem in the existing literature.

DSDec 13, 2021
How to Find a Good Explanation for Clustering?

Sayan Bandyapadhyay, Fedor V. Fomin, Petr A. Golovach et al.

$k$-means and $k$-median clustering are powerful unsupervised machine learning techniques. However, due to complicated dependences on all the features, it is challenging to interpret the resulting cluster assignments. Moshkovitz, Dasgupta, Rashtchian, and Frost [ICML 2020] proposed an elegant model of explainable $k$-means and $k$-median clustering. In this model, a decision tree with $k$ leaves provides a straightforward characterization of the data set into clusters. We study two natural algorithmic questions about explainable clustering. (1) For a given clustering, how to find the "best explanation" by using a decision tree with $k$ leaves? (2) For a given set of points, how to find a decision tree with $k$ leaves minimizing the $k$-means/median objective of the resulting explainable clustering? To address the first question, we introduce a new model of explainable clustering. Our model, inspired by the notion of outliers in robust statistics, is the following. We are seeking a small number of points (outliers) whose removal makes the existing clustering well-explainable. For addressing the second question, we initiate the study of the model of Moshkovitz et al. from the perspective of multivariate complexity. Our rigorous algorithmic analysis sheds some light on the influence of parameters like the input size, dimension of the data, the number of outliers, the number of clusters, and the approximation ratio, on the computational complexity of explainable clustering.

OCDec 29, 2020
Present-Biased Optimization

Fedor V. Fomin, Pierre Fraigniaud, Petr A. Golovach

This paper explores the behavior of present-biased agents, that is, agents who erroneously anticipate the costs of future actions compared to their real costs. Specifically, the paper extends the original framework proposed by Akerlof (1991) for studying various aspects of human behavior related to time-inconsistent planning, including procrastination, and abandonment, as well as the elegant graph-theoretic model encapsulating this framework recently proposed by Kleinberg and Oren (2014). The benefit of this extension is twofold. First, it enables to perform fine grained analysis of the behavior of present-biased agents depending on the optimisation task they have to perform. In particular, we study covering tasks vs. hitting tasks, and show that the ratio between the cost of the solutions computed by present-biased agents and the cost of the optimal solutions may differ significantly depending on the problem constraints. Second, our extension enables to study not only underestimation of future costs, coupled with minimization problems, but also all combinations of minimization/maximization, and underestimation/overestimation. We study the four scenarios, and we establish upper bounds on the cost ratio for three of them (the cost ratio for the original scenario was known to be unbounded), providing a complete global picture of the behavior of present-biased agents, as far as optimisation tasks are concerned.

DSOct 19, 2020
EPTAS for $k$-means Clustering of Affine Subspaces

Eduard Eiben, Fedor V. Fomin, Petr A. Golovach et al.

We consider a generalization of the fundamental $k$-means clustering for data with incomplete or corrupted entries. When data objects are represented by points in $\mathbb{R}^d$, a data point is said to be incomplete when some of its entries are missing or unspecified. An incomplete data point with at most $Δ$ unspecified entries corresponds to an axis-parallel affine subspace of dimension at most $Δ$, called a $Δ$-point. Thus we seek a partition of $n$ input $Δ$-points into $k$ clusters minimizing the $k$-means objective. For $Δ=0$, when all coordinates of each point are specified, this is the usual $k$-means clustering. We give an algorithm that finds an $(1+ ε)$-approximate solution in time $f(k,ε, Δ) \cdot n^2 \cdot d$ for some function $f$ of $k,ε$, and $Δ$ only.

DSMay 10, 2019
Refined Complexity of PCA with Outliers

Fedor V. Fomin, Petr A. Golovach, Fahad Panolan et al.

Principal component analysis (PCA) is one of the most fundamental procedures in exploratory data analysis and is the basic step in applications ranging from quantitative finance and bioinformatics to image analysis and neuroscience. However, it is well-documented that the applicability of PCA in many real scenarios could be constrained by an "immune deficiency" to outliers such as corrupted observations. We consider the following algorithmic question about the PCA with outliers. For a set of $n$ points in $\mathbb{R}^{d}$, how to learn a subset of points, say 1% of the total number of points, such that the remaining part of the points is best fit into some unknown $r$-dimensional subspace? We provide a rigorous algorithmic analysis of the problem. We show that the problem is solvable in time $n^{O(d^2)}$. In particular, for constant dimension the problem is solvable in polynomial time. We complement the algorithmic result by the lower bound, showing that unless Exponential Time Hypothesis fails, in time $f(d)n^{o(d)}$, for any function $f$ of $d$, it is impossible not only to solve the problem exactly but even to approximate it within a constant factor.

DSJul 18, 2018
Approximation Schemes for Low-Rank Binary Matrix Approximation Problems

Fedor V. Fomin, Petr A. Golovach, Daniel Lokshtanov et al.

We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constrains. The new constrained clustering problem encompasses a number of problems and by solving it, we obtain the first linear time-approximation schemes for a number of well-studied fundamental problems concerning clustering of binary vectors and low-rank approximation of binary matrices. Among the problems solvable by our approach are \textsc{Low GF(2)-Rank Approximation}, \textsc{Low Boolean-Rank Approximation}, and various versions of \textsc{Binary Clustering}. For example, for \textsc{Low GF(2)-Rank Approximation} problem, where for an $m\times n$ binary matrix $A$ and integer $r>0$, we seek for a binary matrix $B$ of $GF_2$ rank at most $r$ such that $\ell_0$ norm of matrix $A-B$ is minimum, our algorithm, for any $ε>0$ in time $ f(r,ε)\cdot n\cdot m$, where $f$ is some computable function, outputs a $(1+ε)$-approximate solution with probability at least $(1-\frac{1}{e})$. Our approximation algorithms substantially improve the running times and approximation factors of previous works. We also give (deterministic) PTASes for these problems running in time $n^{f(r)\frac{1}{ε^2}\log \frac{1}ε}$, where $f$ is some function depending on the problem. Our algorithm for the constrained clustering problem is based on a novel sampling lemma, which is interesting in its own.