Cheuk Ting Li

LG
h-index15
10papers
101citations
Novelty58%
AI Score57

10 Papers

61.7ITJun 4
Automated Proving of Shannon-Type Entropy Inequalities via Fine-Tuned Language Models and Guided Tree Search

Shing Yin Wong, Shaocheng Liu, Linqi Song et al.

Proving Shannon-type entropy inequalities is a fundamental task in information theory that often requires constructing non-trivial linear combinations of known constraints, which is a combinatorial search problem that scales poorly with the number of random variables. We investigate whether small-scale large language models (0.6B--1.7B parameters), fine-tuned on atomic proof steps and combined with guided beam search, can automate this process. On a held-out test set of 60 inequalities spanning n=10 to 15 variables, our 0.6B fine-tuned model achieves an 85\% proof success rate with tree search. GPT-5.5 solves 1.7\% samples under zero-shot prompting while Psitip solves 33.3\% samples. A systematic ablation study across training context length (4096 vs.\ 8192 tokens) and data distribution (n=9-skewed vs not skewed) reveals that a 4096-token not skewed training distribution yields the best performance, with extended context and skewed data providing no marginal benefit. We further identify two dominant failure modes -- format failures and step quality degradation -- and verify that the beam-scoring heuristic is essential via a controlled ablation (random scoring reduces success from 83\% to 23\%).

LGOct 31, 2023
Compression with Exact Error Distribution for Federated Learning

Mahmoud Hegazy, Rémi Leluc, Cheuk Ting Li et al.

Compression schemes have been extensively used in Federated Learning (FL) to reduce the communication cost of distributed learning. While most approaches rely on a bounded variance assumption of the noise produced by the compressor, this paper investigates the use of compression and aggregation schemes that produce a specific error distribution, e.g., Gaussian or Laplace, on the aggregated data. We present and analyze different aggregation schemes based on layered quantizers achieving exact error distribution. We provide different methods to leverage the proposed compression schemes to obtain compression-for-free in differential privacy applications. Our general compression methods can recover and improve standard FL schemes with Gaussian perturbations such as Langevin dynamics and randomized smoothing.

LGFeb 27, 2024Code
An Interpretable Evaluation of Entropy-based Novelty of Generative Models

Jingwei Zhang, Cheuk Ting Li, Farzan Farnia

The massive developments of generative model frameworks require principled methods for the evaluation of a model's novelty compared to a reference dataset. While the literature has extensively studied the evaluation of the quality, diversity, and generalizability of generative models, the assessment of a model's novelty compared to a reference model has not been adequately explored in the machine learning community. In this work, we focus on the novelty assessment for multi-modal distributions and attempt to address the following differential clustering task: Given samples of a generative model $P_\mathcal{G}$ and a reference model $P_\mathrm{ref}$, how can we discover the sample types expressed by $P_\mathcal{G}$ more frequently than in $P_\mathrm{ref}$? We introduce a spectral approach to the differential clustering task and propose the Kernel-based Entropic Novelty (KEN) score to quantify the mode-based novelty of $P_\mathcal{G}$ with respect to $P_\mathrm{ref}$. We analyze the KEN score for mixture distributions with well-separable components and develop a kernel-based method to compute the KEN score from empirical data. We support the KEN framework by presenting numerical results on synthetic and real image datasets, indicating the framework's effectiveness in detecting novel modes and comparing generative models. The paper's code is available at: www.github.com/buyeah1109/KEN

17.2ITApr 24
Nonasymptotic Oblivious Relaying and Variable-Length Noisy Lossy Source Coding

Yanxiao Liu, Sepehr Heidari Advary, Cheuk Ting Li

The information bottleneck channel (or the oblivious relay channel) concerns a channel coding setting where the decoder does not directly observe the channel output. Rather, the channel output is relayed to the decoder by an oblivious relay (which does not know the codebook) via a rate-limited link. The capacity is known to be given by the information bottleneck. We study finite-blocklength achievability results of the channel, where the relay communicates to the decoder via fixed-length or variable-length codes. These two cases give rise to two different second-order versions of the information bottleneck. Our proofs utilize the nonasymptotic noisy lossy source coding results by Kostina and Verdú, the strong functional representation lemma, and the Poisson matching lemma. Moreover, we also give a novel nonasymptotic variable-length noisy lossy source coding result.

LGDec 23, 2024Code
Be More Diverse than the Most Diverse: Optimal Mixtures of Generative Models via Mixture-UCB Bandit Algorithms

Parham Rezaei, Farzan Farnia, Cheuk Ting Li

The availability of multiple training algorithms and architectures for generative models requires a selection mechanism to form a single model over a group of well-trained generation models. The selection task is commonly addressed by identifying the model that maximizes an evaluation score based on the diversity and quality of the generated data. However, such a best-model identification approach overlooks the possibility that a mixture of available models can outperform each individual model. In this work, we numerically show that a mixture of generative models on benchmark image datasets can indeed achieve a better evaluation score (based on FID and KID scores), compared to the individual models. This observation motivates the development of efficient algorithms for selecting the optimal mixture of the models. To address this, we formulate a quadratic optimization problem to find an optimal mixture model achieving the maximum of kernel-based evaluation scores including kernel inception distance (KID) and Rényi kernel entropy (RKE). To identify the optimal mixture of the models using the fewest possible sample queries, we view the selection task as a multi-armed bandit (MAB) problem and propose the Mixture Upper Confidence Bound (Mixture-UCB) algorithm that provably converges to the optimal mixture of the involved models. More broadly, the proposed Mixture-UCB can be extended to optimize every convex quadratic function of the mixture weights in a general MAB setting. We prove a regret bound for the Mixture-UCB algorithm and perform several numerical experiments to show the success of Mixture-UCB in finding the optimal mixture of text and image generative models. The project code is available at https://github.com/Rezaei-Parham/Mixture-UCB.

LGMay 4, 2024Code
Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach

Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li et al.

A fine-grained comparison of generative models requires the identification of sample types generated differently by each of the involved models. While quantitative scores have been proposed in the literature to rank different generative models, score-based evaluation and ranking do not reveal the nuanced differences between the generative models in producing different sample types. In this work, we propose solving a differential clustering problem to detect sample types generated differently by two generative models. To solve the differential clustering problem, we develop a spectral method called Fourier-based Identification of Novel Clusters (FINC) to identify modes produced by a generative model with a higher frequency in comparison to a reference distribution. FINC provides a scalable algorithm based on random Fourier features to estimate the eigenspace of kernel covariance matrices of two generative models and utilize the principal eigendirections to detect the sample types present more dominantly in each model. We demonstrate the application of the FINC method to large-scale computer vision datasets and generative modeling frameworks. Our numerical results suggest the scalability of the developed Fourier-based method in highlighting the sample types produced with different frequencies by generative models. The project code is available at https://github.com/buyeah1109/FINC.

33.8ITApr 13
A Non-Probabilistic Game-Theoretic Information Theory Which Subsumes Probabilistic Channel Coding

Cheuk Ting Li

Probabilistic settings (e.g., vanishing-error channel coding) and non-probabilistic settings (e.g., zero-error channel coding and adversarial channels) were considered two related but different branches of information theory which do not subsume each other. We propose a unifying non-probabilistic information theory based on game theory and dynamic hedging which subsumes the conventional probabilistic channel coding theorem (vanishing error, with or without feedback) and lossless source coding theorem, as well as adversarial settings. Coding is modelled as a deterministic game between an encoder and an adversary, where the encoder may purchase insurance with a payoff that depends on the channel outputs. Our framework is based on a generalization of the works by Ville, Dawid, Shafer and Vovk on the game-theoretic formulation of probabilistic concepts, by relaxing the convex pricing cone to a nonconvex downward closed cone, which is precisely the relaxation needed to model information transmission. Pricing downward closed cone is a versatile tool for non-probabilistic coding results that can subsume their probabilistic counterparts, and provides a canonical form for probabilistic channels, adversarial channels and arbitrarily varying channels.

LGJan 21, 2025
Communication-Efficient and Privacy-Adaptable Mechanism for Federated Learning

Chih Wei Ling, Chun Hei Michael Shiu, Youqi Wu et al.

Training machine learning models on decentralized private data via federated learning (FL) poses two key challenges: communication efficiency and privacy protection. In this work, we address these challenges within the trusted aggregator model by introducing a novel approach called the Communication-Efficient and Privacy-Adaptable Mechanism (CEPAM), achieving both objectives simultaneously. In particular, CEPAM leverages the rejection-sampled universal quantizer (RSUQ), a construction of randomized vector quantizer whose resulting distortion is equivalent to a prescribed noise, such as Gaussian or Laplace noise, enabling joint differential privacy and compression. Our CEPAM provides the additional benefit of privacy adaptability, allowing clients and the server to customize privacy protection based on required accuracy and protection. We theoretically analyze the privacy guarantee of CEPAM and investigate the trade-offs among user privacy and accuracy of CEPAM through experimental evaluations. Moreover, we assess CEPAM's utility performance using MNIST dataset, demonstrating that CEPAM surpasses baseline models in terms of learning accuracy.

MLDec 21, 2014
Locally Weighted Learning for Naive Bayes Classifier

Kim-Hung Li, Cheuk Ting Li

As a consequence of the strong and usually violated conditional independence assumption (CIA) of naive Bayes (NB) classifier, the performance of NB becomes less and less favorable compared to sophisticated classifiers when the sample size increases. We learn from this phenomenon that when the size of the training data is large, we should either relax the assumption or apply NB to a "reduced" data set, say for example use NB as a local model. The latter approach trades the ignored information for the robustness to the model assumption. In this paper, we consider using NB as a model for locally weighted data. A special weighting function is designed so that if CIA holds for the unweighted data, it also holds for the weighted data. The new method is intuitive and capable of handling class imbalance. It is theoretically more sound than the locally weighted learners of naive Bayes that base classification only on the $k$ nearest neighbors. Empirical study shows that the new method with appropriate choice of parameter outperforms seven existing classifiers of similar nature.

ITDec 17, 2014
Maximal Correlation Secrecy

Cheuk Ting Li, Abbas El Gamal

This paper shows that the Hirschfeld-Gebelein-Rényi maximal correlation between the message and the ciphertext provides good secrecy guarantees for cryptosystems that use short keys. We first establish a bound on the eavesdropper's advantage in guessing functions of the message in terms of maximal correlation and the Rényi entropy of the message. This result implies that maximal correlation is stronger than the notion of entropic security introduced by Russell and Wang. We then show that a small maximal correlation $ρ$ can be achieved via a randomly generated cipher with key length $\approx2\log(1/ρ)$, independent of the message length, and by a stream cipher with key length $2\log(1/ρ)+\log n+2$ for a message of length $n$. We establish a converse showing that these ciphers are close to optimal. This is in contrast to entropic security for which there is a gap between the lower and upper bounds. Finally, we show that a small maximal correlation implies secrecy with respect to several mutual information based criteria but is not necessarily implied by them. Hence, maximal correlation is a stronger and more practically relevant measure of secrecy than mutual information.