Fedor V. Fomin

DS
h-index34
12papers
160citations
Novelty59%
AI Score56

12 Papers

87.2DSMar 19
Algorithms for Euclidean Distance Matrix Completion: Exploiting Proximity to Triviality

Fedor V. Fomin, Petr A. Golovach, M. S. Ramanujan et al.

In the d-Euclidean Distance Matrix Completion (d-EDMC) problem, one aims to determine whether a given partial matrix of pairwise distances can be extended to a full Euclidean distance matrix in d dimensions. This problem is a cornerstone of computational geometry with numerous applications. While classical work on this problem often focuses on exploiting connections to semidefinite programming typically leading to approximation algorithms, we focus on exact algorithms and propose a novel distance-from-triviality parameterization framework to obtain tractability results for d-EDMC. We identify key structural patterns in the input that capture entry density, including chordal substructures and coverability of specified entries by fully specified principal submatrices. We obtain: (1) The first fixed-parameter algorithm (FPT algorithm) for d-EDMC parameterized by d and the maximum number of unspecified entries per row/column. This is achieved through a novel compression algorithm that reduces a given instance to a submatrix on O(1) rows (for fixed values of the parameters). (2) The first FPT algorithm for d-EDMC parameterized by d and the minimum number of fully specified principal submatrices whose entries cover all specified entries of the given matrix. This result is also achieved through a compression algorithm. (3) A polynomial-time algorithm for d-EDMC when both d and the minimum fill-in of a natural graph representing the specified entries are fixed constants. This result is achieved by combining tools from distance geometry and algorithms from real algebraic geometry. Our work identifies interesting parallels between EDM completion and graph problems, with our algorithms exploiting techniques from both domains.

20.1CGMar 25
Line Cover and Related Problems

Matthias Bentert, Fedor v. Fomin, Petr A. Golovach et al.

We study extensions of the classic \emph{Line Cover} problem, which asks whether a set of $n$ points in the plane can be covered using $k$ lines. Line Cover is known to be NP-hard, and we focus on two natural generalizations. The first is \textbf{Line Clustering}, where the goal is to find $k$ lines minimizing the sum of squared distances from the input points to their nearest line. The second is \textbf{Hyperplane Cover}, which asks whether $n$ points in $\mathbb{R}^d$ can be covered by $k$ hyperplanes. We also study the more general \textbf{Projective Clustering} problem, which unifies both settings and has applications in machine learning, data analysis, and computational geometry. In this problem, one seeks $k$ affine subspaces of dimension $r$ that minimize the sum of squared distances from the given points in $\mathbb{R}^d$ to the nearest subspace. Our results reveal notable differences in the parameterized complexity of these problems. While Line Cover is fixed-parameter tractable when parameterized by $k$, we show that Line Clustering is W[1]-hard with respect to $k$ and does not admit an algorithm with running time $n^{o(k)}$ unless the Exponential Time Hypothesis fails. Hyperplane Cover has been known to be NP-hard since the 1980s, following work of Megiddo and Tamir, even for $d=2$, we show that it remains NP-hard even when $k=2$. Finally, we present an algorithm for Projective Clustering running in $n^{O(dk(r+1))}$ time. This bound matches our lower bound for Line Clustering and generalizes the classic algorithm for $k$-Means Clustering ($r=0$) by Inaba, Katoh, and Imai [SoCG 1994].

DSNov 13, 2025
Discounted Cuts: A Stackelberg Approach to Network Disruption

Pål Grønås Drange, Fedor V. Fomin, Petr Golovach et al.

We study a Stackelberg variant of the classical Most Vital Links problem, modeled as a one-round adversarial game between an attacker and a defender. The attacker strategically removes up to $k$ edges from a flow network to maximally disrupt flow between a source $s$ and a sink $t$, after which the defender optimally reroutes the remaining flow. To capture this attacker--defender interaction, we introduce a new mathematical model of discounted cuts, in which the cost of a cut is evaluated by excluding its $k$ most expensive edges. This model generalizes the Most Vital Links problem and uncovers novel algorithmic and complexity-theoretic properties. We develop a unified algorithmic framework for analyzing various forms of discounted cut problems, including minimizing or maximizing the cost of a cut under discount mechanisms that exclude either the $k$ most expensive or the $k$ cheapest edges. While most variants are NP-complete on general graphs, our main result establishes polynomial-time solvability for all discounted cut problems in our framework when the input is restricted to bounded-genus graphs, a relevant class that includes many real-world networks such as transportation and infrastructure networks. With this work, we aim to open collaborative bridges between artificial intelligence, algorithmic game theory, and operations research.

74.5DSApr 28
Clustering Permutations under the Ulam Metric: A Parameterized Complexity Study

Tian Bai, Fedor V. Fomin, Petr A. Golovach et al.

Rank aggregation seeks a representative permutation for a collection of rankings and plays a central role in areas such as social choice, information retrieval, and computational biology. Two fundamental aggregation tasks are the center and median problems, which minimize the maximum and the total distance to the input permutations, respectively. While these problems are well understood under Kendall's tau and related distances, their parameterized complexity under the Ulam metric, an edit-distance-based metric on permutations, has remained largely unexplored. In this work, we initiate a systematic study of the parameterized complexity of rank aggregation under the Ulam metric. We consider both the center and median problems, as well as their generalizations to the $k$-center and $k$-median clustering settings, parameterized by the number of centers $k$ and the distance budget $d$ (corresponding to the maximum distance for center variants and the total distance for median variants). Both problems are known to be NP-hard already for $k=1$. We show that the Ulam $k$-center problem remains NP-hard when $d=1$, but is fixed-parameter tractable when parameterized by $k + d$. Our algorithm is based on a novel local-search framework tailored to the non-local nature of Ulam distances. We complement this by proving that no polynomial kernel exists for the $k+d$ parameterization unless NP $\subseteq$ coNP/poly. For the Ulam $k$-median problem parameterized by the total distance $d$, we establish W[1]-hardness and provide an XP algorithm. We also provide a polynomial kernel for the parameter $k + d$, which in turn yields a fixed-parameter tractable algorithm.

LGMar 13, 2024
Structural perspective on constraint-based learning of Markov networks

Tuukka Korhonen, Fedor V. Fomin, Pekka Parviainen

Markov networks are probabilistic graphical models that employ undirected graphs to depict conditional independence relationships among variables. Our focus lies in constraint-based structure learning, which entails learning the undirected graph from data through the execution of conditional independence tests. We establish theoretical limits concerning two critical aspects of constraint-based learning of Markov networks: the number of tests and the sizes of the conditioning sets. These bounds uncover an exciting interplay between the structural properties of the graph and the amount of tests required to learn a Markov network. The starting point of our work is that the graph parameter maximum pairwise connectivity, $κ$, that is, the maximum number of vertex-disjoint paths connecting a pair of vertices in the graph, is responsible for the sizes of independence tests required to learn the graph. On one hand, we show that at least one test with the size of the conditioning set at least $κ$ is always necessary. On the other hand, we prove that any graph can be learned by performing tests of size at most $κ$. This completely resolves the question of the minimum size of conditioning sets required to learn the graph. When it comes to the number of tests, our upper bound on the sizes of conditioning sets implies that every $n$-vertex graph can be learned by at most $n^κ$ tests with conditioning sets of sizes at most $κ$. We show that for any upper bound $q$ on the sizes of the conditioning sets, there exist graphs with $O(n q)$ vertices that require at least $n^{Ω(κ)}$ tests to learn. This lower bound holds even when the treewidth and the maximum degree of the graph are at most $κ+2$. On the positive side, we prove that every graph of bounded treewidth can be learned by a polynomial number of tests with conditioning sets of sizes at most $2κ$.

68.3DSApr 1
A Framework for Parameterized Subexponential-Subcubic-Time Algorithms for Weighted Problems in Planar Graphs

Matthias Bentert, Fedor V. Fomin, Petr A. Golovach

Many problems are known to be solvable in subexponential parameterized time when the input graph is planar. The bidimensionality framework of Demaine, Fomin, Hajiaghay, and Thilikos [JACM'05] and the treewidth-pattern-covering approach by Fomin, Lokshtanov, Marx, Pilipczuk, Pilipczuk, and Saurabh [SICOMP'22] give robust tools for designing such algorithms. However, there are still many problems for which we do not know whether subexponential parameterized algorithms exist. The bidimensionality framework is not able to handle weights or directed graphs and the treewidth-pattern-covering approach only works for finding connected solutions. Building on a result by Nederlof [STOC'20], we provide a framework that is able to solve a variety of problems in planar graphs in subexponential parameterized time for which this was previously not known (where the polynomial part of the running time is usually $O(n^{2.49})$). Our framework can handle weights, does not require solutions to contain only few connected components, and applies to cases where the number of potential patterns of a solution is exponential in the parameter. We then use the framework to show that various weighted problems like Weighted Partial Vertex Cover, Maximum-Weight Induced Forest, Minimum-Weight Rooted Simple Minor, and Maximum-Weight Rooted Parallel Induced Minor allow for subexponential parameterized algorithms. This was previously not known for any of them. Moreover, we present a very easy-to-use fragment of our framework. This fragment allows for significantly simpler proofs in the case of Maximum-Weight Independent Set and Maximum $(k, n-k)$-Cut and is able to show a subexponential parameterized algorithm for weighted versions of Densest $k$-Subgraph. Even the unweighted version was not known before and is stated as an open problem in the existing literature.

DSDec 13, 2021
How to Find a Good Explanation for Clustering?

Sayan Bandyapadhyay, Fedor V. Fomin, Petr A. Golovach et al.

$k$-means and $k$-median clustering are powerful unsupervised machine learning techniques. However, due to complicated dependences on all the features, it is challenging to interpret the resulting cluster assignments. Moshkovitz, Dasgupta, Rashtchian, and Frost [ICML 2020] proposed an elegant model of explainable $k$-means and $k$-median clustering. In this model, a decision tree with $k$ leaves provides a straightforward characterization of the data set into clusters. We study two natural algorithmic questions about explainable clustering. (1) For a given clustering, how to find the "best explanation" by using a decision tree with $k$ leaves? (2) For a given set of points, how to find a decision tree with $k$ leaves minimizing the $k$-means/median objective of the resulting explainable clustering? To address the first question, we introduce a new model of explainable clustering. Our model, inspired by the notion of outliers in robust statistics, is the following. We are seeking a small number of points (outliers) whose removal makes the existing clustering well-explainable. For addressing the second question, we initiate the study of the model of Moshkovitz et al. from the perspective of multivariate complexity. Our rigorous algorithmic analysis sheds some light on the influence of parameters like the input size, dimension of the data, the number of outliers, the number of clusters, and the approximation ratio, on the computational complexity of explainable clustering.

OCDec 29, 2020
Present-Biased Optimization

Fedor V. Fomin, Pierre Fraigniaud, Petr A. Golovach

This paper explores the behavior of present-biased agents, that is, agents who erroneously anticipate the costs of future actions compared to their real costs. Specifically, the paper extends the original framework proposed by Akerlof (1991) for studying various aspects of human behavior related to time-inconsistent planning, including procrastination, and abandonment, as well as the elegant graph-theoretic model encapsulating this framework recently proposed by Kleinberg and Oren (2014). The benefit of this extension is twofold. First, it enables to perform fine grained analysis of the behavior of present-biased agents depending on the optimisation task they have to perform. In particular, we study covering tasks vs. hitting tasks, and show that the ratio between the cost of the solutions computed by present-biased agents and the cost of the optimal solutions may differ significantly depending on the problem constraints. Second, our extension enables to study not only underestimation of future costs, coupled with minimization problems, but also all combinations of minimization/maximization, and underestimation/overestimation. We study the four scenarios, and we establish upper bounds on the cost ratio for three of them (the cost ratio for the original scenario was known to be unbounded), providing a complete global picture of the behavior of present-biased agents, as far as optimisation tasks are concerned.

DSOct 19, 2020
EPTAS for $k$-means Clustering of Affine Subspaces

Eduard Eiben, Fedor V. Fomin, Petr A. Golovach et al.

We consider a generalization of the fundamental $k$-means clustering for data with incomplete or corrupted entries. When data objects are represented by points in $\mathbb{R}^d$, a data point is said to be incomplete when some of its entries are missing or unspecified. An incomplete data point with at most $Δ$ unspecified entries corresponds to an axis-parallel affine subspace of dimension at most $Δ$, called a $Δ$-point. Thus we seek a partition of $n$ input $Δ$-points into $k$ clusters minimizing the $k$-means objective. For $Δ=0$, when all coordinates of each point are specified, this is the usual $k$-means clustering. We give an algorithm that finds an $(1+ ε)$-approximate solution in time $f(k,ε, Δ) \cdot n^2 \cdot d$ for some function $f$ of $k,ε$, and $Δ$ only.

DSJul 20, 2020
On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications

Sayan Bandyapadhyay, Fedor V. Fomin, Kirill Simonov

Fair clustering is a constrained variant of clustering where the goal is to partition a set of colored points, such that the fraction of points of any color in every cluster is more or less equal to the fraction of points of this color in the dataset. This variant was recently introduced by Chierichetti et al. [NeurIPS, 2017] in a seminal work and became widely popular in the clustering literature. In this paper, we propose a new construction of coresets for fair clustering based on random sampling. The new construction allows us to obtain the first coreset for fair clustering in general metric spaces. For Euclidean spaces, we obtain the first coreset whose size does not depend exponentially on the dimension. Our coreset results solve open questions proposed by Schmidt et al. [WAOA, 2019] and Huang et al. [NeurIPS, 2019]. The new coreset construction helps to design several new approximation and streaming algorithms. In particular, we obtain the first true constant-approximation algorithm for metric fair clustering, whose running time is fixed-parameter tractable (FPT). In the Euclidean case, we derive the first $(1+ε)$-approximation algorithm for fair clustering whose time complexity is near-linear and does not depend exponentially on the dimension of the space. Besides, our coreset construction scheme is fairly general and gives rise to coresets for a wide range of constrained clustering problems. This leads to improved constant-approximations for these problems in general metrics and near-linear time $(1+ε)$-approximations in the Euclidean metric.

DSMay 10, 2019
Refined Complexity of PCA with Outliers

Fedor V. Fomin, Petr A. Golovach, Fahad Panolan et al.

Principal component analysis (PCA) is one of the most fundamental procedures in exploratory data analysis and is the basic step in applications ranging from quantitative finance and bioinformatics to image analysis and neuroscience. However, it is well-documented that the applicability of PCA in many real scenarios could be constrained by an "immune deficiency" to outliers such as corrupted observations. We consider the following algorithmic question about the PCA with outliers. For a set of $n$ points in $\mathbb{R}^{d}$, how to learn a subset of points, say 1% of the total number of points, such that the remaining part of the points is best fit into some unknown $r$-dimensional subspace? We provide a rigorous algorithmic analysis of the problem. We show that the problem is solvable in time $n^{O(d^2)}$. In particular, for constant dimension the problem is solvable in polynomial time. We complement the algorithmic result by the lower bound, showing that unless Exponential Time Hypothesis fails, in time $f(d)n^{o(d)}$, for any function $f$ of $d$, it is impossible not only to solve the problem exactly but even to approximate it within a constant factor.

DSJul 18, 2018
Approximation Schemes for Low-Rank Binary Matrix Approximation Problems

Fedor V. Fomin, Petr A. Golovach, Daniel Lokshtanov et al.

We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constrains. The new constrained clustering problem encompasses a number of problems and by solving it, we obtain the first linear time-approximation schemes for a number of well-studied fundamental problems concerning clustering of binary vectors and low-rank approximation of binary matrices. Among the problems solvable by our approach are \textsc{Low GF(2)-Rank Approximation}, \textsc{Low Boolean-Rank Approximation}, and various versions of \textsc{Binary Clustering}. For example, for \textsc{Low GF(2)-Rank Approximation} problem, where for an $m\times n$ binary matrix $A$ and integer $r>0$, we seek for a binary matrix $B$ of $GF_2$ rank at most $r$ such that $\ell_0$ norm of matrix $A-B$ is minimum, our algorithm, for any $ε>0$ in time $ f(r,ε)\cdot n\cdot m$, where $f$ is some computable function, outputs a $(1+ε)$-approximate solution with probability at least $(1-\frac{1}{e})$. Our approximation algorithms substantially improve the running times and approximation factors of previous works. We also give (deterministic) PTASes for these problems running in time $n^{f(r)\frac{1}{ε^2}\log \frac{1}ε}$, where $f$ is some function depending on the problem. Our algorithm for the constrained clustering problem is based on a novel sampling lemma, which is interesting in its own.