LOAIJun 6, 2018

Constrained Counting and Sampling: Bridging the Gap between Theory and Practice

arXiv:1806.02239v123 citations
Originality Highly original
AI Analysis

This work addresses the gap between theoretical guarantees and practical performance in constrained counting and sampling, which is crucial for applications in areas such as probabilistic reasoning and verification, representing a substantial improvement over prior methods.

The thesis tackled the problem of constrained counting and sampling, which are fundamental in computer science with applications like network reliability and privacy, by introducing a novel hashing-based algorithmic framework that combines universal hashing with SAT/SMT tools, resulting in tools (ApproxMC2 and UniGen) capable of handling formulas with up to a million variables, a significant increase from prior tools limited to a few hundred variables.

Constrained counting and sampling are two fundamental problems in Computer Science with numerous applications, including network reliability, privacy, probabilistic reasoning, and constrained-random verification. In constrained counting, the task is to compute the total weight, subject to a given weighting function, of the set of solutions of the given constraints. In constrained sampling, the task is to sample randomly, subject to a given weighting function, from the set of solutions to a set of given constraints. Consequently, constrained counting and sampling have been subject to intense theoretical and empirical investigations over the years. Prior work, however, offered either heuristic techniques with poor guarantees of accuracy or approaches with proven guarantees but poor performance in practice. In this thesis, we introduce a novel hashing-based algorithmic framework for constrained sampling and counting that combines the classical algorithmic technique of universal hashing with the dramatic progress made in combinatorial reasoning tools, in particular, SAT and SMT, over the past two decades. The resulting frameworks for counting (ApproxMC2) and sampling (UniGen) can handle formulas with up to million variables representing a significant boost up from the prior state of the art tools' capability to handle few hundreds of variables. If the initial set of constraints is expressed as Disjunctive Normal Form (DNF), ApproxMC is the only known Fully Polynomial Randomized Approximation Scheme (FPRAS) that does not involve Monte Carlo steps. By exploiting the connection between definability of formulas and variance of the distribution of solutions in a cell defined by 3-universal hash functions, we introduced an algorithmic technique, MIS, that reduced the size of XOR constraints employed in the underlying universal hash functions by as much as two orders of magnitude.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes