LGJul 25, 2022
Boolean and $\mathbb{F}_p$-Matrix Factorization: From Theory to PracticeFedor Fomin, Fahad Panolan, Anurag Patil et al.
Boolean Matrix Factorization (BMF) aims to find an approximation of a given binary matrix as the Boolean product of two low-rank binary matrices. Binary data is ubiquitous in many fields, and representing data by binary matrices is common in medicine, natural language processing, bioinformatics, computer graphics, among many others. Unfortunately, BMF is computationally hard and heuristic algorithms are used to compute Boolean factorizations. Very recently, the theoretical breakthrough was obtained independently by two research groups. Ban et al. (SODA 2019) and Fomin et al. (Trans. Algorithms 2020) show that BMF admits an efficient polynomial-time approximation scheme (EPTAS). However, despite the theoretical importance, the high double-exponential dependence of the running times from the rank makes these algorithms unimplementable in practice. The primary research question motivating our work is whether the theoretical advances on BMF could lead to practical algorithms. The main conceptional contribution of our work is the following. While EPTAS for BMF is a purely theoretical advance, the general approach behind these algorithms could serve as the basis in designing better heuristics. We also use this strategy to develop new algorithms for related $\mathbb{F}_p$-Matrix Factorization. Here, given a matrix $A$ over a finite field GF($p$) where $p$ is a prime, and an integer $r$, our objective is to find a matrix $B$ over the same field with GF($p$)-rank at most $r$ minimizing some norm of $A-B$. Our empirical research on synthetic and real-world data demonstrates the advantage of the new algorithms over previous works on BMF and $\mathbb{F}_p$-Matrix Factorization.
DSApr 17
Backdoors for Quantified Boolean FormulasLeif Eriksson, Victor Lagerkvist, Sebastian Ordyniak et al.
The quantified Boolean formula problem (QBF) is a well-known PSpace-complete problem with rich expressive power, and is generally viewed as the SAT analogue for PSpace. Given that many problems today are solved in practice by reducing to SAT, and then using highly optimized SAT solvers, it is natural to ask whether problems in PSpace are amenable to this approach. While SAT solvers exploit hidden structural properties, such as backdoors to tractability, backdoor analysis for QBF is comparatively very limited. We present a comprehensive study of the (parameterized) complexity of QBF parameterized by backdoor size to the largest tractable syntactic classes: HORN, 2-SAT, and AFFINE. While SAT is in FPT under this parameterization, we prove that QBF remains PSpace-hard even on formulas with backdoors of constant size. Parameterizing additionally by the quantifier depth, we design FPT-algorithms for the classes 2-SAT and AFFINE, and show that 3-HORN is W[1]-hard. As our next contribution, we vastly extend the applicability of QBF backdoors not only for the syntactic classes defined above but also for tractable classes defined via structural restrictions, such as formulas with bounded incidence treewidth and quantifier depth. To this end, we introduce enhanced backdoors: these are separators S of size at most k in the primal graph such that S together with all variables contained in any purely universal component of the primal graph minus S is a backdoor. We design FPT-algorithms with respect to k for both evaluation and detection of enhanced backdoors to all tractable classes of QBF listed above and more.
DSApr 14
The Parameterized Complexity of Vertex-Coloring Edge-WeightingShubhada Aute, Fahad Panolan, Geevarghese Philip
Motivated by the landmark resolution of the 1-2-3 Conjecture, we initiate the study of the parameterized complexity of the Vertex-Coloring {0,1}-Edge-Weighting problem and its generalization, Vertex-Coloring Pre-edge-Weighting, under various structural parameters. The base problem, Vertex-Coloring {0,1}-Edge-Weighting, asks whether we can assign a weight from {0,1} to each edge of a graph. The goal is to ensure that for every pair of adjacent vertices, the sums of their incident edge weights are distinct. In the Vertex-Coloring Pre-edge-Weighting variant, we are given a graph where a subset of edges is already assigned fixed weights from {0,1}. The goal is to determine if this partial weighting can be extended to all remaining edges such that the final, complete assignment satisfies the proper vertex coloring property. While the existence of such weightings is well-understood for specific graph classes, their algorithmic complexity under structural parameterization has remained unexplored. We prove both hardness and tractability for the problem, across a hierarchy of structural parameters. We show that both the base problem and the Pre-edge-Weighting variant are W[1]-hard when parameterized by the size of a feedback vertex set of the input graph. On the positive side, we establish that the base problem and a restricted Pre-edge-Weighting variant where the pre-assigned weights are all 1, become FPT when parameterized by the size of a vertex cover of the input graph. Further, we show that both the base problem and the Pre-edge-Weighting variant have XP algorithms when parameterized by the treewidth of the input graph.
CCMay 12
Clausal Deletion Backdoors for QBF: a Parameterized Complexity ApproachLeif Eriksson, Victor Lagerkvist, Sebastian Ordyniak et al.
Determining the validity of a quantified Boolean formula (QBF) is a PSPACE-complete problem with rich expressive power. Despite interest in efficient solvers, there is, compared to problems in NP, a lack of positive theoretical results, and in the parameterized complexity setting one often has to restrict the quantifier prefix (e.g., bounding alternations) to obtain fixed parameter tractability (FPT). We propose a new parameter: the number of variables in clauses that has to be removed before reaching a tractable class (a clause covering (CC) backdoor). We are then interested in solving QBF in FPT time given a CC-backdoor of size $k$. We consider the three classical, tractable cases of QBF as base classes: Horn, 2-CNF, and linear equations. We establish W[1]-hardness for Horn but prove FPT for the others, and prove that in a precise, algebraic sense, we are only missing one important case for a full dichotomy. Our algorithms are non-trivial and depend on propagation, and Gaussian elimination, respectively, and are comparably unexplored for QBF.
CCMay 10, 2024
Solving Quantified Boolean Formulas with Few Existential VariablesLeif Eriksson, Victor Lagerkvist, George Osipov et al.
The quantified Boolean formula (QBF) problem is an important decision problem generally viewed as the archetype for PSPACE-completeness. Many problems of central interest in AI are in general not included in NP, e.g., planning, model checking, and non-monotonic reasoning, and for such problems QBF has successfully been used as a modelling tool. However, solvers for QBF are not as advanced as state of the art SAT solvers, which has prevented QBF from becoming a universal modelling language for PSPACE-complete problems. A theoretical explanation is that QBF (as well as many other PSPACE-complete problems) lacks natural parameters} guaranteeing fixed-parameter tractability (FPT). In this paper we tackle this problem and consider a simple but overlooked parameter: the number of existentially quantified variables. This natural parameter is virtually unexplored in the literature which one might find surprising given the general scarcity of FPT algorithms for QBF. Via this parameterization we then develop a novel FPT algorithm applicable to QBF instances in conjunctive normal form (CNF) of bounded clause length. We complement this by a W[1]-hardness result for QBF in CNF of unbounded clause length as well as sharper lower bounds for the bounded arity case under the (strong) exponential-time hypothesis.
DSOct 19, 2020
EPTAS for $k$-means Clustering of Affine SubspacesEduard Eiben, Fedor V. Fomin, Petr A. Golovach et al.
We consider a generalization of the fundamental $k$-means clustering for data with incomplete or corrupted entries. When data objects are represented by points in $\mathbb{R}^d$, a data point is said to be incomplete when some of its entries are missing or unspecified. An incomplete data point with at most $Δ$ unspecified entries corresponds to an axis-parallel affine subspace of dimension at most $Δ$, called a $Δ$-point. Thus we seek a partition of $n$ input $Δ$-points into $k$ clusters minimizing the $k$-means objective. For $Δ=0$, when all coordinates of each point are specified, this is the usual $k$-means clustering. We give an algorithm that finds an $(1+ ε)$-approximate solution in time $f(k,ε, Δ) \cdot n^2 \cdot d$ for some function $f$ of $k,ε$, and $Δ$ only.
DSMay 10, 2019
Refined Complexity of PCA with OutliersFedor V. Fomin, Petr A. Golovach, Fahad Panolan et al.
Principal component analysis (PCA) is one of the most fundamental procedures in exploratory data analysis and is the basic step in applications ranging from quantitative finance and bioinformatics to image analysis and neuroscience. However, it is well-documented that the applicability of PCA in many real scenarios could be constrained by an "immune deficiency" to outliers such as corrupted observations. We consider the following algorithmic question about the PCA with outliers. For a set of $n$ points in $\mathbb{R}^{d}$, how to learn a subset of points, say 1% of the total number of points, such that the remaining part of the points is best fit into some unknown $r$-dimensional subspace? We provide a rigorous algorithmic analysis of the problem. We show that the problem is solvable in time $n^{O(d^2)}$. In particular, for constant dimension the problem is solvable in polynomial time. We complement the algorithmic result by the lower bound, showing that unless Exponential Time Hypothesis fails, in time $f(d)n^{o(d)}$, for any function $f$ of $d$, it is impossible not only to solve the problem exactly but even to approximate it within a constant factor.
DSJul 18, 2018
Approximation Schemes for Low-Rank Binary Matrix Approximation ProblemsFedor V. Fomin, Petr A. Golovach, Daniel Lokshtanov et al.
We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constrains. The new constrained clustering problem encompasses a number of problems and by solving it, we obtain the first linear time-approximation schemes for a number of well-studied fundamental problems concerning clustering of binary vectors and low-rank approximation of binary matrices. Among the problems solvable by our approach are \textsc{Low GF(2)-Rank Approximation}, \textsc{Low Boolean-Rank Approximation}, and various versions of \textsc{Binary Clustering}. For example, for \textsc{Low GF(2)-Rank Approximation} problem, where for an $m\times n$ binary matrix $A$ and integer $r>0$, we seek for a binary matrix $B$ of $GF_2$ rank at most $r$ such that $\ell_0$ norm of matrix $A-B$ is minimum, our algorithm, for any $ε>0$ in time $ f(r,ε)\cdot n\cdot m$, where $f$ is some computable function, outputs a $(1+ε)$-approximate solution with probability at least $(1-\frac{1}{e})$. Our approximation algorithms substantially improve the running times and approximation factors of previous works. We also give (deterministic) PTASes for these problems running in time $n^{f(r)\frac{1}{ε^2}\log \frac{1}ε}$, where $f$ is some function depending on the problem. Our algorithm for the constrained clustering problem is based on a novel sampling lemma, which is interesting in its own.