Hoai-An Nguyen

DS
4papers
4citations
Novelty66%
AI Score48

4 Papers

90.9DSMay 31
On Sketching Trimmed Statistics

Honghao Lin, Hoai-An Nguyen, David P. Woodruff

We study sketching trimmed statistics of a frequency vector, including the $F_p$ moment of the top-$k$ coordinates and of the trimmed-$k$ vector. Despite their natural role in robust analytics, this is the first time these problems have been studied in any sublinear space setting. For $p \in [0,2]$, we obtain $poly(\log n/\varepsilon)$-space algorithms for both tasks when $k$ is moderately large, and for general $k$ we identify a sharp structural threshold that characterizes exactly when sublinear space is possible: in particular, it is actually determined by the ratio between $a_k^2$ and $\|x_{-k}\|_2^2/k$. We extend these results to $p > 2$ and present several applications including algorithms for thresholded $F_p$ estimation and generalized impact indices. Notably, we improve the space bounds of Govindan, Monemizadeh, and Muthukrishnan (PODS 2017) for computing the $h$-index.

LGJan 6, 2023
Provable Reset-free Reinforcement Learning by No-Regret Reduction

Hoai-An Nguyen, Ching-An Cheng

Reinforcement learning (RL) so far has limited real-world applications. One key challenge is that typical RL algorithms heavily rely on a reset mechanism to sample proper initial states; these reset mechanisms, in practice, are expensive to implement due to the need for human intervention or heavily engineered environments. To make learning more practical, we propose a generic no-regret reduction to systematically design reset-free RL algorithms. Our reduction turns the reset-free RL problem into a two-player game. We show that achieving sublinear regret in this two-player game would imply learning a policy that has both sublinear performance regret and sublinear total number of resets in the original RL problem. This means that the agent eventually learns to perform optimally and avoid resets. To demonstrate the effectiveness of this reduction, we design an instantiation for linear Markov decision processes, which is the first provably correct reset-free RL algorithm.

55.9DSApr 5
Unbiased Insights: Optimal Streaming Algorithms for $\ell_p$ Sampling, the Forget Model, and Beyond

Honghao Lin, Hoai-An Nguyen, William Swartworth et al.

We study $\ell_p$ sampling and frequency moment estimation in a single-pass insertion-only data stream. For $p \in (0,2)$, we present a nearly space-optimal approximate $\ell_p$ sampler that uses $\widetilde{O}(\log n \log(1/δ))$ bits of space and for $p = 2$, we present a sampler with space complexity $\widetilde{O}(\log^2 n \log(1/δ))$. This space complexity is optimal for $p \in (0, 2)$ and improves upon prior work by a $\log n$ factor. We further extend our construction to a continuous $\ell_p$ sampler, which outputs a valid sample index at every point during the stream. Leveraging these samplers, we design nearly unbiased estimators for $F_p$ in data streams that include forget operations, which reset individual element frequencies and introduce significant non-linear challenges. As a result, we obtain near-optimal algorithms for estimating $F_p$ for all $p$ in this model, originally proposed by Pavan, Chakraborty, Vinodchandran, and Meel [PODS'24], resolving all three open problems they posed. Furthermore, we generalize this model to what we call the suffix-prefix deletion model, and extend our techniques to estimate entropy as a corollary of our moment estimation algorithms. Finally, we show how to handle arbitrary coordinate-wise functions during the stream, for any $g \in \mathbb{G}$, where $\mathbb{G}$ includes all (linear or non-linear) contraction functions.

76.8DSMay 10
Streaming Complexity Separations for Dense and Sparse Graphs

Yang P. Liu, Hoai-An Nguyen, Noah G. Singer et al.

We identify a sharp separation in the streaming space complexity of Maximum Cut when the algorithm must output an approximate cut (rather than only the approximate value). For dense graphs, we show that $O(n/\varepsilon^2)$ space is sufficient and that $Ω(n)$ space is necessary. In contrast, for graphs with $Θ(n/\varepsilon^2)$ edges, the situation is markedly different: we show that the problem requires $Ω(n \log(\varepsilon^2 n)/\varepsilon^2)$ space for any $\varepsilon=ω(1/\sqrt{n})$, which is tight for the full range of $\varepsilon$. We also give an $Ω(n \log n/\varepsilon^2)$-space lower bound against deterministic algorithms for outputting a $(1-\varepsilon)$ approximation to the value of the maximum cut. Using similar techniques we prove an analogous sharp separation in the streaming space complexity of Densest Subgraph and show that for every constant-arity CSP over a constant-size alphabet and the Similarity problem the space complexity in dense streams can be improved by shaving a logarithmic factor.