93.1DSMay 26
Fault-Tolerant ST-Diameter OraclesDavide Bilò, Keerti Choudhary, Sarel Cohen et al.
Given two vertex sets $S$ and $T$ in a graph, the $ST$-diameter is the maximum $s$-$t$-distance between vertices $s \in S$ and $t \in T$. We study the problem of estimating the $ST$-diameter of graphs that are subject to a small number of transient edge failures. An $f$-edge fault-tolerant $ST$-diameter oracle ($f$-FDO-$ST$) is a data structure that preprocesses a graph $G$, sets $S$, $T$, and a positive integer $f$. When queried with a set $F$ of at most $f$ failing edges, the oracle returns an estimate $\widehat{D}$ of the $ST$-diameter in $G-F$. The oracle is said to have stretch $σ\geq 1$ if $\operatorname{diam}(G{-}F,S,T) \leq \widehat{D} \leq σ\cdot \operatorname{diam}(G{-}F,S,T)$. We design new $f$-FDO-$ST$s by reducing their construction to that of all-pairs and single-source distance sensitivity oracles ($f$-DSOs). These are data structures that estimate the pairwise graph distances, or respectively the distances from a distinguished source, under up to $f$ failures. We obtain several new trade-offs between the size of the $ST$-diameter oracles, their stretch guarantees, query and preprocessing times by combining our black-box reductions with $f$-DSO results from the literature. We further provide a lower bound on the space requirement of approximate $ST$-diameter oracles. We prove that there exists a family of graphs for which any $f$-FDO-$ST$ with sensitivity $f \ge 2$ and stretch better than $5/3$ requires $Ω(n^{3/2})$ bits of space, regardless of the query time.
LGFeb 22, 2023
Fair Correlation Clustering in ForestsKatrin Casel, Tobias Friedrich, Martin Schirneck et al.
The study of algorithmic fairness received growing attention recently. This stems from the awareness that bias in the input data for machine learning systems may result in discriminatory outputs. For clustering tasks, one of the most central notions of fairness is the formalization by Chierichetti, Kumar, Lattanzi, and Vassilvitskii [NeurIPS 2017]. A clustering is said to be fair, if each cluster has the same distribution of manifestations of a sensitive attribute as the whole input set. This is motivated by various applications where the objects to be clustered have sensitive attributes that should not be over- or underrepresented. We discuss the applicability of this fairness notion to Correlation Clustering. The existing literature on the resulting Fair Correlation Clustering problem either presents approximation algorithms with poor approximation guarantees or severely limits the possible distributions of the sensitive attribute (often only two manifestations with a 1:1 ratio are considered). Our goal is to understand if there is hope for better results in between these two extremes. To this end, we consider restricted graph classes which allow us to characterize the distributions of sensitive attributes for which this form of fairness is tractable from a complexity point of view. While existing work on Fair Correlation Clustering gives approximation algorithms, we focus on exact solutions and investigate whether there are efficiently solvable instances. The unfair version of Correlation Clustering is trivial on forests, but adding fairness creates a surprisingly rich picture of complexities. We give an overview of the distributions and types of forests where Fair Correlation Clustering turns from tractable to intractable. The most surprising insight to us is the fact that the cause of the hardness of Fair Correlation Clustering is not the strictness of the fairness condition.
65.3DSApr 30
Simpler and Improved Replacement Path CoveringsDavide Bilò, Shiri Chechik, Keerti Choudhary et al.
An important tool in the design of fault-tolerant graph data structures are $(L,f)$-replacement path coverings (RPCs). An RPC is a family $\mathcal{G}$ of subgraphs of a given graph $G$ such that, for every set $F$ of at most $f$ edges, there is a subfamily $\mathcal{G}_F \,{\subseteq}\, \mathcal{G}$ with the following properties. (1) No subgraph in $\mathcal{G}_F$ contains an edge of $F$. (2) For each pair of vertices $s,t$ that have a shortest path in $G-F$ with at most $L$ edges, one such path also exists in some subgraph in $\mathcal{G}_F$. The covering value of the RPC is the total number $|\mathcal{G}|$ of subgraphs. The query time is the time needed to compute the subfamily $\mathcal{G}_F$ given the set $F$. Weimann and Yuster [TALG'13] devised a randomized RPC with covering value $\widetilde{O}(fL^f)$ and query time $\widetilde{O}(f^2 L^f)$. This was derandomized by Karthik and Parter [TALG'24], who also reduced the query time to $\widetilde{O}(f^2 L)$. Their approach uses some heavy algebraic machinery involving error-correcting codes and an increased covering value of $O((cfL \log n)^{f+1})$ for some constant $c > 1$. We instead devise a much simpler derandomization via conditional expectations that lowers the covering value back to $\widetilde{O}(fL^{f+o(1)})$ and decreases the query time to $\widetilde{O}(f^{5/2}L^{o(1)})$, assuming $f = o(\log L)$. We also investigate the optimal covering value of any $(L,f)$-replacement path covering (deterministic or randomized) for different parameter ranges. We provide a new randomized construction as well as improving a known lower bound, also by Karthik and Parter. For example, for $f = o(\log L)$, we give an RPC with $\widetilde{O}( (L/f)^f L^{o(1)})$ subgraphs and show that this is tight up to the $L^{o(1)}$ term.
DSMar 6
Transversal Rank, Conformality and EnumerationMartin Schirneck
The transversal rank of a hypergraph is the maximum size of its minimal hitting sets. Deciding, for an $n$-vertex, $m$-edge hypergraph and an integer $k$, whether the transversal rank is at least $k$ takes time $O(m^{k+1} n)$ with an algorithm that is known since the 70s. It essentially matches an $(m+n)^{Ω(k)}$ ETH-lower bound by Araújo, Bougeret, Campos, and Sau [Algorithmica 2023] and Dublois, Lampis, and Paschos [TCS 2022]. Many hypergraphs seen in practice have much more edges than vertices, $m \gg n$. This raises the question whether an improvement of the run time dependency on $m$ can be traded for an increase in the dependency on $n$. Our first result is an algorithm to recognize hypergraphs with transversal rank at least $k$ in time $O(Δ^{k-2} mn^{k-1})$, where $Δ\le m$ is the maximum degree. Our main technical contribution is a ``look-ahead'' method that allows us to find higher-order extensions, minimal hitting sets that augment a given set with at least two more vertices. We show that this method can also be used to enumerate all minimal hitting sets of a hypergraph with transversal rank $k^*$ with delay $O(Δ^{k^*-1} mn^2)$. We then explore the possibility of further reducing the running time for computing the transversal rank to $\textsf{poly}(m) \cdot n^{k+O(1)}$. This turns out to be equivalent to several breakthroughs in combinatorial algorithms and enumeration. Among other things, such an improvement is possible if and only if $k$-conformal hypergraphs can also be recognized in time $\textsf{poly}(m) \cdot n^{k+O(1)}$, and iff the maximal hypercliques/independent sets of a uniform hypergraph can be enumerated with incremental delay.
LGJul 15, 2020
timeXplain -- A Framework for Explaining the Predictions of Time Series ClassifiersFelix Mujkanovic, Vanja Doskoč, Martin Schirneck et al.
Modern time series classifiers display impressive predictive capabilities, yet their decision-making processes mostly remain black boxes to the user. At the same time, model-agnostic explainers, such as the recently proposed SHAP, promise to make the predictions of machine learning models interpretable, provided there are well-designed domain mappings. We bring both worlds together in our timeXplain framework, extending the reach of explainable artificial intelligence to time series classification and value prediction. We present novel domain mappings for the time domain, frequency domain, and time series statistics and analyze their explicative power as well as their limits. We employ a novel evaluation metric to experimentally compare timeXplain to several model-specific explanation approaches for state-of-the-art time series classifiers.