5 Papers

72.9LGJun 3
Prediction Under Imperfect Compression: A Theory of Approximate MDL

Qian Li, Xinyu Mao, Shang-Hua Teng et al.

Minimum Description Length (MDL) formalizes the principle of Occam's razor by optimizing the total description length: $L(\mathrm{model})+L(\mathrm{data} \ | \ \mathrm{model})$. For sequential prediction, the MDL method repeatedly selects a model with a minimum objective score of the observed prefix for the next step prediction. Classical MDL prediction theory shows that exact optimization of the MDL objective indeed provides a strong compression guarantee that supports reliable prediction. However, practical machine learning usually can only find models by approximately optimizing the objective function. To bridge this gap, this paper addresses the following fundamental question: Under what forms of approximation and regularization does approximate MDL still guarantee reliable sequential prediction? This work offers a principled characterization. We prove that for any approximation with additive slack $C$ of the more general form of the balanced MDL objective: $λ\cdot L(\mathrm{model})+L(\mathrm{data} \ | \ \mathrm{model})$, the cumulative expected squared prediction error is finite for all $λ\ge1$. The case $λ>1$ is proved by an affinity-telescoping argument, while the boundary case $λ=1$ is proved by a likelihood-ratio stopping argument based on exact static MDL bounds. Our results establish that classical MDL regularization remains robust to any fixed additive optimization error. Furthermore, we establish that our characterization of the approximate MDL framework is sharp: When $0<λ<1$, overfits can happen to incur infinite cumulative expected error in the universal class of estimable measures, and hence a strong form of model-complexity regularization is necessary. In addition, model selection may fail in every regularized regime $λ>0$, under multiplicative approximation, and thus, additive approximation is both sufficient and essential.

98.7DSJun 3
Exponential Quantum Space Advantage for Approximating Max-$k$SAT in the Streaming Setting

Haoyu Wang, Guangxu Yang

In this paper, we give a one-pass quantum streaming algorithm for Max-$k$SAT that uses $\operatorname{polylog}(n)$ space and achieves a $0.7172$-approximation on instances with $n$ variables. In contrast, prior work by Chou, Golovnev, and Velusamy (FOCS 2020) implies that achieving an approximation ratio better than $\sqrt{2}/2 \approx 0.7071$ for Max-$k$SAT requires $Ω(\sqrt{n})$ space for any classical streaming algorithm. Therefore, it yields an exponential quantum space advantage for Max-$k$SAT in the streaming setting. We further give a one-pass quantum streaming algorithm for Max-2OR that uses $\operatorname{polylog}(n)$ space and achieves a $0.7425$-approximation on instances with $n$ variables. Combining with the known results, it gives a complete classification of quantum space advantages for all Boolean Max-2CSPs.

14.9CCApr 22
A Lifting Theorem for Hybrid Classical-Quantum Communication Complexity

Xudong Wu, Guangxu Yang, Penghui Yao

We investigates a model of hybrid classical-quantum communication complexity, in which two parties first exchange classical messages and subsequently communicate using quantum messages. We study the trade-off between the classical and quantum communication for composed functions of the form $f\circ G^n$, where $f:\{0,1\}^n\to\{\pm1\}$ and $G$ is an inner product function of $Θ(\log n)$ bits. To prove the trade-off, we establish a novel lifting theorem for hybrid communication complexity. This theorem unifies two previously separate lifting paradigms: the query-to-communication lifting framework for classical communication complexity and the approximate-degree-to-generalized-discrepancy lifting methods for quantum communication complexity. Our hybrid lifting theorem therefore offers a new framework for proving lower bounds in hybrid classical-quantum communication models. As a corollary, we show that any hybrid protocol communicating $c$ classical bits followed by $q$ qubits to compute $f\circ G^n$ must satisfy $c+q^2=Ω\big(\max\{\mathrm{deg}(f),\mathrm{bs}(f)\}\cdot\log n\big)$, where $\mathrm{deg}(f)$ is the degree of $f$ and $\mathrm{bs}(f)$ is the block sensitivity of $f$. For read-once formula $f$, this yields an almost tight trade-off: either they have to exchange $Θ\big(n\cdot\log n\big)$ classical bits or $\widetildeΘ\big(\sqrt n\cdot\log n\big)$ qubits, showing that classical pre-processing cannot significantly reduce the quantum communication required. To the best of our knowledge, this is the first non-trivial trade-off between classical and quantum communication in hybrid two-way communication complexity.

73.1QUANT-PHMar 24
Exponential Separation of Quantum and Classical One-Way Numbers-on-Forehead Communication

Guangxu Yang, Jiapeng Zhang

Numbers-on-Forehead (NOF) communication model is a central model in communication complexity. As a restricted variant, one-way NOF model is of particular interest. Establishing strong one-way NOF lower bounds would imply circuit lower bounds, resolve well-known problems in additive combinatorics, and yield wide-ranging applications in areas such as cryptography and distributed computing. However, proving strong lower bounds in one-way NOF communication remains highly challenging; many fundamental questions in one-way NOF communication remain wide open. One of the fundamental questions, proposed by Gavinsky and Pudlák (CCC 2008), is to establish an explicit exponential separation between quantum and classical one-way NOF communication. In this paper, we resolve this open problem by establishing the first exponential separation between quantum and randomized communication complexity in one-way NOF model. Specifically, we define a lifted variant of the Hidden Matching problem of Bar-Yossef, Jayram, and Kerenidis (STOC 2004) and show that it admits an ($O(\log n)$)-cost quantum protocol in the one-way NOF setting. By contrast, we prove that any $k$-party one-way randomized protocol for this problem requires communication $Ω(\frac{n^{1/3}}{2^{k/3}})$. Notably, our separation applies even to a generalization of $k$-player one-way communication, where the first player speaks once, and all other $k-1$ players can communicate freely.

LGAug 20, 2021
PASTO: Strategic Parameter Optimization in Recommendation Systems -- Probabilistic is Better than Deterministic

Weicong Ding, Hanlin Tang, Jingshuo Feng et al.

Real-world recommendation systems often consist of two phases. In the first phase, multiple predictive models produce the probability of different immediate user actions. In the second phase, these predictions are aggregated according to a set of 'strategic parameters' to meet a diverse set of business goals, such as longer user engagement, higher revenue potential, or more community/network interactions. In addition to building accurate predictive models, it is also crucial to optimize this set of 'strategic parameters' so that primary goals are optimized while secondary guardrails are not hurt. In this setting with multiple and constrained goals, this paper discovers that a probabilistic strategic parameter regime can achieve better value compared to the standard regime of finding a single deterministic parameter. The new probabilistic regime is to learn the best distribution over strategic parameter choices and sample one strategic parameter from the distribution when each user visits the platform. To pursue the optimal probabilistic solution, we formulate the problem into a stochastic compositional optimization problem, in which the unbiased stochastic gradient is unavailable. Our approach is applied in a popular social network platform with hundreds of millions of daily users and achieves +0.22% lift of user engagement in a recommendation task and +1.7% lift in revenue in an advertising optimization scenario comparing to using the best deterministic parameter strategy.