Yanxiao Liu

IT
h-index11
4papers
12citations
Novelty43%
AI Score46

4 Papers

76.5ITMay 11
Tighter Information-Theoretic Generalization Bounds via a Novel Class of Change of Measure Inequalities

Yanxiao Liu, Yijun Fan, Deniz Gündüz

Change of measure inequalities translate divergences between probability measures into explicit bounds on event probabilities, and play an important role in deriving probabilistic guarantees in learning theory, information theory, and statistics. We propose novel change of measure inequalities via a unified framework based on the data processing inequality, which is surprisingly elementary yet powerful enough to yield novel, tighter inequalities. We provide change of measure inequalities in terms of a broad family of information measures, including $f$-divergences (with Kullback-Leibler divergence and $χ^2$-divergence as special cases), Rényi divergence, and $α$-mutual information (with maximal leakage as a special case). We apply these results to generalization error analysis, PAC-Bayesian theory, differential privacy, and data memorization, obtaining stronger guarantees while recovering best-known results through simplified analyses.

22.4ITApr 24
Nonasymptotic Oblivious Relaying and Variable-Length Noisy Lossy Source Coding

Yanxiao Liu, Sepehr Heidari Advary, Cheuk Ting Li

The information bottleneck channel (or the oblivious relay channel) concerns a channel coding setting where the decoder does not directly observe the channel output. Rather, the channel output is relayed to the decoder by an oblivious relay (which does not know the codebook) via a rate-limited link. The capacity is known to be given by the information bottleneck. We study finite-blocklength achievability results of the channel, where the relay communicates to the decoder via fixed-length or variable-length codes. These two cases give rise to two different second-order versions of the information bottleneck. Our proofs utilize the nonasymptotic noisy lossy source coding results by Kostina and Verdú, the strong functional representation lemma, and the Poisson matching lemma. Moreover, we also give a novel nonasymptotic variable-length noisy lossy source coding result.

CVJan 28
Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification

Xin Jin, Jinming Liu, Yuntao Wei et al.

"Compression Tells Intelligence", is supported by research in artificial intelligence, particularly concerning (multimodal) large language models (LLMs/MLLMs), where compression efficiency often correlates with improved model performance and capabilities. For compression, classical visual coding based on traditional information theory has developed over decades, achieving great success with numerous international industrial standards widely applied in multimedia (e.g., image/video) systems. Except that, the recent emergingvisual token technology of generative multi-modal large models also shares a similar fundamental objective like visual coding: maximizing semantic information fidelity during the representation learning while minimizing computational cost. Therefore, this paper provides a comprehensive overview of two dominant technique families first -- Visual Coding and Vision Token Technology -- then we further unify them from the aspect of optimization, discussing the essence of compression efficiency and model performance trade-off behind. Next, based on the proposed unified formulation bridging visual coding andvisual token technology, we synthesize bidirectional insights of themselves and forecast the next-gen visual codec and token techniques. Last but not least, we experimentally show a large potential of the task-oriented token developments in the more practical tasks like multimodal LLMs (MLLMs), AI-generated content (AIGC), and embodied AI, as well as shedding light on the future possibility of standardizing a general token technology like the traditional codecs (e.g., H.264/265) with high efficiency for a wide range of intelligent tasks in a unified and effective manner.

50.2ITApr 17
On the Generalization Error of Differentially Private Algorithms via Typicality

Yanxiao Liu, Chun Hei Michael Shiu, Lele Wang et al.

We study the generalization error of stochastic learning algorithms from an information-theoretic perspective, with a particular emphasis on deriving sharper bounds for differentially private algorithms. It is well known that the generalization error of stochastic learning algorithms can be bounded in terms of mutual information and maximal leakage, yielding in-expectation and high-probability guarantees, respectively. In this work, we further upper bound mutual information and maximal leakage by explicit, easily computable formulas, using typicality-based arguments and exploiting the stability properties of private algorithms. In the first part of the paper, we strictly improve the mutual-information bounds by Rodríguez-Gálvez et al. (IEEE Trans. Inf. Theory, 2021). In the second part, we derive new upper bounds on the maximal leakage of learning algorithms. In both cases, the resulting bounds on information measures translate directly into generalization error guarantees.