LGAug 14, 2023
A Unifying Generator Loss Function for Generative Adversarial NetworksJustin Veiner, Fady Alajaji, Bahman Gharesifard
A unifying $α$-parametrized generator loss function is introduced for a dual-objective generative adversarial network (GAN), which uses a canonical (or classical) discriminator loss function such as the one in the original GAN (VanillaGAN) system. The generator loss function is based on a symmetric class probability estimation type function, $\mathcal{L}_α$, and the resulting GAN system is termed $\mathcal{L}_α$-GAN. Under an optimal discriminator, it is shown that the generator's optimization problem consists of minimizing a Jensen-$f_α$-divergence, a natural generalization of the Jensen-Shannon divergence, where $f_α$ is a convex function expressed in terms of the loss function $\mathcal{L}_α$. It is also demonstrated that this $\mathcal{L}_α$-GAN problem recovers as special cases a number of GAN problems in the literature, including VanillaGAN, Least Squares GAN (LSGAN), Least $k$th order GAN (L$k$GAN) and the recently introduced $(α_D,α_G)$-GAN with $α_D=1$. Finally, experimental results are conducted on three datasets, MNIST, CIFAR-10, and Stacked MNIST to illustrate the performance of various examples of the $\mathcal{L}_α$-GAN system.
LGMar 9, 2022
Renyi Fair Information Bottleneck for Image ClassificationAdam Gronowski, William Paul, Fady Alajaji et al.
We develop a novel method for ensuring fairness in machine learning which we term as the Renyi Fair Information Bottleneck (RFIB). We consider two different fairness constraints - demographic parity and equalized odds - for learning fair representations and derive a loss function via a variational approach that uses Renyi's divergence with its tunable parameter $α$ and that takes into account the triple constraints of utility, fairness, and compactness of representation. We then evaluate the performance of our method for image classification using the EyePACS medical imaging dataset, showing it outperforms competing state of the art techniques with performance measured using a variety of compound utility/fairness metrics, including accuracy gap and Rawls' minimal accuracy.
LGJun 20, 2022
Classification Utility, Fairness, and Compactness via Tunable Information Bottleneck and Rényi MeasuresAdam Gronowski, William Paul, Fady Alajaji et al.
Designing machine learning algorithms that are accurate yet fair, not discriminating based on any sensitive attribute, is of paramount importance for society to accept AI for critical applications. In this article, we propose a novel fair representation learning method termed the Rényi Fair Information Bottleneck Method (RFIB) which incorporates constraints for utility, fairness, and compactness (compression) of representation, and apply it to image and tabular data classification. A key attribute of our approach is that we consider - in contrast to most prior work - both demographic parity and equalized odds as fairness constraints, allowing for a more nuanced satisfaction of both criteria. Leveraging a variational approach, we show that our objectives yield a loss function involving classical Information Bottleneck (IB) measures and establish an upper bound in terms of two Rényi measures of order $α$ on the mutual information IB term measuring compactness between the input and its encoded embedding. We study the influence of the $α$ parameter as well as two other tunable IB parameters on achieving utility/fairness trade-off goals, and show that the $α$ parameter gives an additional degree of freedom that can be used to control the compactness of the representation. Experimenting on three different image datasets (EyePACS, CelebA, and FairFace) and two tabular datasets (Adult and COMPAS), using both binary and categorical sensitive attributes, we show that on various utility, fairness, and compound utility/fairness metrics RFIB outperforms current state-of-the-art approaches.
32.1ITApr 1
Communication Complexity of Exact Sampling under Rényi InformationSpencer Hill, Fady Alajaji, Tamás Linder
We study the problem of exact sampling under an exponential communication cost, specifically Campbell's average codeword length $L(t)$ of order $t$, and Rényi's entropy. We provide a lower bound on the Campbell cost of exact sampling that grows approximately as $D_{1/α}(P||Q)$, the Rényi divergence of order $1/α$, with $α= \frac{1}{1+t}$. Using the Poisson functional representation of Li and El Gamal, we prove an upper bound on $L(t)$ whose leading Rényi divergence term has order within $ε$ of that of the lower bound. Our results reduce to the bounds of Harsha et al. as $α\to 1$. We also provide numerical examples comparing the bounds in the cases of normal and Laplacian distributions, demonstrating that the upper and lower bounds are typically within 5-10 bits of each other. Our results characterize exactly the optimal asymptotic Campbell cost $L(t)$ per sample as the number of independent and identically distributed (i.i.d.) samples grows to infinity. We show that under the exponential cost, any causal sampler performs strictly worse asymptotically than noncausal samplers. This contrasts with the case of expected message length, where both causal and noncausal samplers have the same optimal asymptotic cost.
ITJun 28, 2022
On the Rényi Cross-EntropyFerenc Cole Thierrin, Fady Alajaji, Tamás Linder
The Rényi cross-entropy measure between two distributions, a generalization of the Shannon cross-entropy, was recently used as a loss function for the improved design of deep learning generative adversarial networks. In this work, we examine the properties of this measure and derive closed-form expressions for it when one of the distributions is fixed and when both distributions belong to the exponential family. We also analytically determine a formula for the cross-entropy rate for stationary Gaussian processes and for finite-alphabet Markov sources.
CVFeb 15, 2023
Evaluating Trade-offs in Computer Vision Between Attribute Privacy, Fairness and UtilityWilliam Paul, Philip Mathew, Fady Alajaji et al.
This paper investigates to what degree and magnitude tradeoffs exist between utility, fairness and attribute privacy in computer vision. Regarding privacy, we look at this important problem specifically in the context of attribute inference attacks, a less addressed form of privacy. To create a variety of models with different preferences, we use adversarial methods to intervene on attributes relating to fairness and privacy. We see that that certain tradeoffs exist between fairness and utility, privacy and utility, and between privacy and fairness. The results also show that those tradeoffs and interactions are more complex and nonlinear between the three goals than intuition would suggest.
82.5ITMar 19
Pólya Thresholds GraphsJinghan Yu, Fady Alajaji, Bahman Gharesifard
We introduce the Pólya threshold graph model and derive its stochastic and algebraic properties. This random threshold graph is generated sequentially via a two-color Pólya urn process. Starting from an empty graph, each time step involves a draw from the urn that produces an indicator variable, determining whether a newly added node is universal (connected to all existing nodes and itself) or isolated (connected to no existing nodes). This construction yields a random threshold graph with an adjacency matrix that admits an explicit representation in terms of the draw sequence. Using the structure of the Pólya draw process, we derive the exact degree distribution for any arbitrary node, including its mean and variance. Furthermore, we evaluate a distance-based decay centrality score and provide an explicit expression for its expectation. On the algebraic side, we explicitly characterize the Laplacian matrix of the random threshold graph, obtaining a closed-form description of its spectrum and corresponding eigenbasis. Finally, as an application of these structural results, we analyze discrete-time consensus dynamics on Pólya threshold graphs.
78.8ITApr 25
Rejection Sampling is Optimal for Relative Entropy CodingSpencer Hill, Fady Alajaji, Tamás Linder et al.
In relative entropy coding, a sender aims to design a stochastic code such that, on input $X \sim P_X$, the receiver can generate a sample $Y \sim P_{Y \mid X}$. It is a standard result that (1) this requires at least $I(X; Y)$ bits, (2) the lower bound is achievable within a logarithmic gap, and (3) this gap cannot be reduced in general. The necessity of the gap suggests that the mutual information is not the correct information measure to quantify the rate of relative entropy coding. A potential alternative emerged in the work of Flamich et al. (2025), who proved a tighter lower bound of $I_F(X \to Y)$, a quantity we call the functional information. In this paper, we show that this lower bound is tight by constructing the ring toss code, an encoding method for rejection sampling which uses at most $I_F(X \to Y) + \log e$ bits. This demonstrates that rejection sampling is optimal for relative entropy coding. Our result implies that the classical mutual information lower bound is achievable within $\log(I(X; Y) + 1) + 2.45$ bits in general and within $1.45$ bits for singular channels, which are both the tightest bounds of their kind to date. Moreover, our one-shot result also recovers Sriramu and Wagner's asymptotic results on the second-order redundancy of relative entropy codes.
8.0ROApr 23
SLAM as a Stochastic Control Problem with Partial Information: Optimal Solutions and Rigorous ApproximationsIlir Gusija, Fady Alajaji, Serdar Yüksel
Simultaneous localization and mapping (SLAM) is a foundational state estimation problem in robotics in which a robot accurately constructs a map of its environment while also localizing itself within this construction. We study the active SLAM problem through the lens of optimal stochastic control, thereby recasting it as a decision-making problem under partial information. After reviewing several commonly studied models, we present a general stochastic control formulation of active SLAM together with a rigorous treatment of motion, sensing, and map representation. We introduce a new exploration stage cost that encodes the geometry of the state when evaluating information-gathering actions. This formulation, constructed as a nonstandard partially observable Markov decision process (POMDP), is then analyzed to derive rigorously justified approximate solutions that are near-optimal. To enable this analysis, the associated regularity conditions are studied under general assumptions that apply to a wide range of robotics applications. For a particular case, we conduct an extensive numerical study in which standard learning algorithms are used to learn near-optimal policies.
ITMay 30, 2025
Bounds on the Excess Minimum Risk via Generalized Information Divergence MeasuresAnanya Omanwar, Fady Alajaji, Tamás Linder
Given finite-dimensional random vectors $Y$, $X$, and $Z$ that form a Markov chain in that order (i.e., $Y \to X \to Z$), we derive upper bounds on the excess minimum risk using generalized information divergence measures. Here, $Y$ is a target vector to be estimated from an observed feature vector $X$ or its stochastically degraded version $Z$. The excess minimum risk is defined as the difference between the minimum expected loss in estimating $Y$ from $X$ and from $Z$. We present a family of bounds that generalize the mutual information based bound of Györfi et al. (2023), using the Rényi and $α$-Jensen-Shannon divergences, as well as Sibson's mutual information. Our bounds are similar to those developed by Modak et al. (2021) and Aminian et al. (2024) for the generalization error of learning algorithms. However, unlike these works, our bounds do not require the sub-Gaussian parameter to be constant and therefore apply to a broader class of joint distributions over $Y$, $X$, and $Z$. We also provide numerical examples under both constant and non-constant sub-Gaussianity assumptions, illustrating that our generalized divergence based bounds can be tighter than the one based on mutual information for certain regimes of the parameter $α$.
LGDec 11, 2020
TARA: Training and Representation Alteration for AI Fairness and Domain GeneralizationWilliam Paul, Armin Hadzic, Neil Joshi et al.
We propose a novel method for enforcing AI fairness with respect to protected or sensitive factors. This method uses a dual strategy performing training and representation alteration (TARA) for the mitigation of prominent causes of AI bias by including: a) the use of representation learning alteration via adversarial independence to suppress the bias-inducing dependence of the data representation from protected factors; and b) training set alteration via intelligent augmentation to address bias-causing data imbalance, by using generative models that allow the fine control of sensitive factors related to underrepresented populations via domain adaptation and latent space manipulation. When testing our methods on image analytics, experiments demonstrate that TARA significantly or fully debiases baseline models while outperforming competing debiasing methods that have the same amount of information, e.g., with (% overall accuracy, % accuracy gap) = (78.8, 0.5) vs. the baseline method's score of (71.8, 10.5) for EyePACS, and (73.7, 11.8) vs. (69.1, 21.7) for CelebA. Furthermore, recognizing certain limitations in current metrics used for assessing debiasing performance, we propose novel conjunctive debiasing metrics. Our experiments also demonstrate the ability of these novel metrics in assessing the Pareto efficiency of the proposed methods.
LGJun 3, 2020
Least $k$th-Order and Rényi Generative Adversarial NetworksHimesh Bhatia, William Paul, Fady Alajaji et al.
We investigate the use of parametrized families of information-theoretic measures to generalize the loss functions of generative adversarial networks (GANs) with the objective of improving performance. A new generator loss function, called least $k$th-order GAN (L$k$GAN), is first introduced, generalizing the least squares GANs (LSGANs) by using a $k$th order absolute error distortion measure with $k \geq 1$ (which recovers the LSGAN loss function when $k=2$). It is shown that minimizing this generalized loss function under an (unconstrained) optimal discriminator is equivalent to minimizing the $k$th-order Pearson-Vajda divergence. Another novel GAN generator loss function is next proposed in terms of Rényi cross-entropy functionals with order $α>0$, $α\neq 1$. It is demonstrated that this Rényi-centric generalized loss function, which provably reduces to the original GAN loss function as $α\to1$, preserves the equilibrium point satisfied by the original GAN based on the Jensen-Rényi divergence, a natural extension of the Jensen-Shannon divergence. Experimental results indicate that the proposed loss functions, applied to the MNIST and CelebA datasets, under both DCGAN and StyleGAN architectures, confer performance benefits by virtue of the extra degrees of freedom provided by the parameters $k$ and $α$, respectively. More specifically, experiments show improvements with regard to the quality of the generated images as measured by the Fréchet Inception Distance (FID) score and training stability. While it was applied to GANs in this study, the proposed approach is generic and can be used in other applications of information theory to deep learning, e.g., the issues of fairness or privacy in artificial intelligence.
CVFeb 25, 2020
Unsupervised Discovery, Control, and Disentanglement of Semantic Attributes with Applications to Anomaly DetectionWilliam Paul, I-Jeng Wang, Fady Alajaji et al.
Our work focuses on unsupervised and generative methods that address the following goals: (a) learning unsupervised generative representations that discover latent factors controlling image semantic attributes, (b) studying how this ability to control attributes formally relates to the issue of latent factor disentanglement, clarifying related but dissimilar concepts that had been confounded in the past, and (c) developing anomaly detection methods that leverage representations learned in (a). For (a), we propose a network architecture that exploits the combination of multiscale generative models with mutual information (MI) maximization. For (b), we derive an analytical result (Lemma 1) that brings clarity to two related but distinct concepts: the ability of generative networks to control semantic attributes of images they generate, resulting from MI maximization, and the ability to disentangle latent space representations, obtained via total correlation minimization. More specifically, we demonstrate that maximizing semantic attribute control encourages disentanglement of latent factors. Using Lemma 1 and adopting MI in our loss function, we then show empirically that, for image generation tasks, the proposed approach exhibits superior performance as measured in the quality and disentanglement trade space, when compared to other state of the art methods, with quality assessed via the Frechet Inception Distance (FID), and disentanglement via mutual information gap. For (c), we design several systems for anomaly detection exploiting representations learned in (a), and demonstrate their performance benefits when compared to state-of-the-art generative and discriminative algorithms. The above contributions in representation learning have potential applications in addressing other important problems in computer vision, such as bias and privacy in AI.
ITJul 8, 2017
Estimation Efficiency Under Privacy ConstraintsShahab Asoodeh, Mario Diaz, Fady Alajaji et al.
We investigate the problem of estimating a random variable $Y\in \mathcal{Y}$ under a privacy constraint dictated by another random variable $X\in \mathcal{X}$, where estimation efficiency and privacy are assessed in terms of two different loss functions. In the discrete case, we use the Hamming loss function and express the corresponding utility-privacy tradeoff in terms of the privacy-constrained guessing probability $h(P_{XY}, ε)$, the maximum probability $\mathsf{P}_\mathsf{c}(Y|Z)$ of correctly guessing $Y$ given an auxiliary random variable $Z\in \mathcal{Z}$, where the maximization is taken over all $P_{Z|Y}$ ensuring that $\mathsf{P}_\mathsf{c}(X|Z)\leq ε$ for a given privacy threshold $ε\geq 0$. We prove that $h(P_{XY}, \cdot)$ is concave and piecewise linear, which allows us to derive its expression in closed form for any $ε$ when $X$ and $Y$ are binary. In the non-binary case, we derive $h(P_{XY}, ε)$ in the high utility regime (i.e., for sufficiently large values of $ε$) under the assumption that $Z$ takes values in $\mathcal{Y}$. We also analyze the privacy-constrained guessing probability for two binary vector scenarios. When $X$ and $Y$ are continuous random variables, we use the squared-error loss function and express the corresponding utility-privacy tradeoff in terms of $\mathsf{sENSR}(P_{XY}, ε)$, which is the smallest normalized minimum mean squared-error (mmse) incurred in estimating $Y$ from its Gaussian perturbation $Z$, such that the mmse of $f(X)$ given $Z$ is within $ε$ of the variance of $f(X)$ for any non-constant real-valued function $f$. We derive tight upper and lower bounds for $\mathsf{sENSR}$ when $Y$ is Gaussian. We also obtain a tight lower bound for $\mathsf{sENSR}(P_{XY}, ε)$ for general absolutely continuous random variables when $ε$ is sufficiently small.
ITApr 12, 2017
Privacy-Aware Guessing EfficiencyShahab Asoodeh, Mario Diaz, Fady Alajaji et al.
We investigate the problem of guessing a discrete random variable $Y$ under a privacy constraint dictated by another correlated discrete random variable $X$, where both guessing efficiency and privacy are assessed in terms of the probability of correct guessing. We define $h(P_{XY}, ε)$ as the maximum probability of correctly guessing $Y$ given an auxiliary random variable $Z$, where the maximization is taken over all $P_{Z|Y}$ ensuring that the probability of correctly guessing $X$ given $Z$ does not exceed $ε$. We show that the map $ε\mapsto h(P_{XY}, ε)$ is strictly increasing, concave, and piecewise linear, which allows us to derive a closed form expression for $h(P_{XY}, ε)$ when $X$ and $Y$ are connected via a binary-input binary-output channel. For $(X^n, Y^n)$ being pairs of independent and identically distributed binary random vectors, we similarly define $\underline{h}_n(P_{X^nY^n}, ε)$ under the assumption that $Z^n$ is also a binary vector. Then we obtain a closed form expression for $\underline{h}_n(P_{X^nY^n}, ε)$ for sufficiently large, but nontrivial values of $ε$.
ITAug 13, 2016
Almost Perfect Privacy for Additive Gaussian Privacy FiltersShahab Asoodeh, Fady Alajaji, Tamas Linder
We study the maximal mutual information about a random variable $Y$ (representing non-private information) displayed through an additive Gaussian channel when guaranteeing that only $ε$ bits of information is leaked about a random variable $X$ (representing private information) that is correlated with $Y$. Denoting this quantity by $g_ε(X,Y)$, we show that for perfect privacy, i.e., $ε=0$, one has $g_0(X,Y)=0$ for any pair of absolutely continuous random variables $(X,Y)$ and then derive a second-order approximation for $g_ε(X,Y)$ for small $ε$. This approximation is shown to be related to the strong data processing inequality for mutual information under suitable conditions on the joint distribution $P_{XY}$. Next, motivated by an operational interpretation of data privacy, we formulate the privacy-utility tradeoff in the same setup using estimation-theoretic quantities and obtain explicit bounds for this tradeoff when $ε$ is sufficiently small using the approximation formula derived for $g_ε(X,Y)$.
ITNov 7, 2015
Information Extraction Under Privacy ConstraintsShahab Asoodeh, Mario Diaz, Fady Alajaji et al.
A privacy-constrained information extraction problem is considered where for a pair of correlated discrete random variables $(X,Y)$ governed by a given joint distribution, an agent observes $Y$ and wants to convey to a potentially public user as much information about $Y$ as possible without compromising the amount of information revealed about $X$. To this end, the so-called {\em rate-privacy function} is introduced to quantify the maximal amount of information (measured in terms of mutual information) that can be extracted from $Y$ under a privacy constraint between $X$ and the extracted information, where privacy is measured using either mutual information or maximal correlation. Properties of the rate-privacy function are analyzed and information-theoretic and estimation-theoretic interpretations of it are presented for both the mutual information and maximal correlation privacy measures. It is also shown that the rate-privacy function admits a closed-form expression for a large family of joint distributions of $(X,Y)$. Finally, the rate-privacy function under the mutual information privacy measure is considered for the case where $(X,Y)$ has a joint probability density function by studying the problem where the extracted information is a uniform quantization of $Y$ corrupted by additive Gaussian noise. The asymptotic behavior of the rate-privacy function is studied as the quantization resolution grows without bound and it is observed that not all of the properties of the rate-privacy function carry over from the discrete to the continuous case.