LGJul 11, 2022Code
Bottlenecks CLUB: Unifying Information-Theoretic Trade-offs Among Complexity, Leakage, and UtilityBehrooz Razeghi, Flavio P. Calmon, Deniz Gunduz et al.
Bottleneck problems are an important class of optimization problems that have recently gained increasing attention in the domain of machine learning and information theory. They are widely used in generative models, fair machine learning algorithms, design of privacy-assuring mechanisms, and appear as information-theoretic performance bounds in various multi-user communication problems. In this work, we propose a general family of optimization problems, termed as complexity-leakage-utility bottleneck (CLUB) model, which (i) provides a unified theoretical framework that generalizes most of the state-of-the-art literature for the information-theoretic privacy models, (ii) establishes a new interpretation of the popular generative and discriminative models, (iii) constructs new insights to the generative compression models, and (iv) can be used in the fair generative models. We first formulate the CLUB model as a complexity-constrained privacy-utility optimization problem. We then connect it with the closely related bottleneck problems, namely information bottleneck (IB), privacy funnel (PF), deterministic IB (DIB), conditional entropy bottleneck (CEB), and conditional PF (CPF). We show that the CLUB model generalizes all these problems as well as most other information-theoretic privacy models. Then, we construct the deep variational CLUB (DVCLUB) models by employing neural networks to parameterize variational approximations of the associated information quantities. Building upon these information quantities, we present unified objectives of the supervised and unsupervised DVCLUB models. Leveraging the DVCLUB model in an unsupervised setup, we then connect it with state-of-the-art generative models, such as variational auto-encoders (VAEs), generative adversarial networks (GANs), as well as the Wasserstein GAN (WGAN), Wasserstein auto-encoder (WAE), and adversarial auto-encoder (AAE) models through the optimal transport (OT) problem. We then show that the DVCLUB model can also be used in fair representation learning problems, where the goal is to mitigate the undesired bias during the training phase of a machine learning model. We conduct extensive quantitative experiments on colored-MNIST and CelebA datasets, with a public implementation available, to evaluate and analyze the CLUB model.
CVJul 10, 2024
Synthetic to Authentic: Transferring Realism to 3D Face Renderings for Boosting Face RecognitionParsa Rahimi, Behrooz Razeghi, Sebastien Marcel
In this paper, we investigate the potential of image-to-image translation (I2I) techniques for transferring realism to 3D-rendered facial images in the context of Face Recognition (FR) systems. The primary motivation for using 3D-rendered facial images lies in their ability to circumvent the challenges associated with collecting large real face datasets for training FR systems. These images are generated entirely by 3D rendering engines, facilitating the generation of synthetic identities. However, it has been observed that FR systems trained on such synthetic datasets underperform when compared to those trained on real datasets, on various FR benchmarks. In this work, we demonstrate that by transferring the realism to 3D-rendered images (i.e., making the 3D-rendered images look more real), we can boost the performance of FR systems trained on these more photorealistic images. This improvement is evident when these systems are evaluated against FR benchmarks utilizing real-world data, thereby paving new pathways for employing synthetic data in real-world applications.
ITApr 12
On the Capacity of Distinguishable Synthetic Identity Generation under Face VerificationBehrooz Razeghi
We study how many synthetic identities can be generated so that a face verifier declares same-identity pairs as matches and different-identity pairs as non-matches at a fixed threshold $τ$. We formalize this question for a generative face-recognition pipeline consisting of a generator followed by a normalized recognition map with outputs on the unit hypersphere. We define the capacity of distinguishable identity generation as the largest number of latent identities whose induced embedding distributions satisfy prescribed same-identity and different-identity verification constraints. In the deterministic view-invariant regime, we show that this capacity is characterized by a spherical-code problem over the realizable set of embeddings, and reduces to the classical spherical-code quantity under a full angular expressivity assumption. For stochastic identity generation, we introduce a centered model and derive a sufficient admissibility condition in which the required separation between identity centers is $\arccos(τ)+2ρ$, where $ρ$ is a within-identity concentration radius. Under full angular expressivity, this yields spherical-code-based achievable lower bounds and a positive asymptotic lower bound on the exponential growth rate with embedding dimension. We also introduce a prior-constrained random-code capacity, in which latent identities are sampled independently from a given prior, and derive high-probability lower bounds in terms of pairwise separation-failure probabilities of the induced identity centers. Under a stronger full-cap-support model, we obtain a converse and an exact spherical-code characterization.
AIApr 12
Principles Do Not Apply Themselves: A Hermeneutic Perspective on AI AlignmentBehrooz Razeghi
AI alignment is often framed as the task of ensuring that an AI system follows a set of stated principles or human preferences, but general principles rarely determine their own application in concrete cases. When principles conflict, when they are too broad to settle a situation, or when the relevant facts are unclear, an additional act of judgment is required. This paper analyzes that step through the lens of hermeneutics and argues that alignment therefore includes an interpretive component: it involves context-sensitive judgments about how principles should be read, applied, and prioritized in practice. We connect this claim to recent empirical findings showing that a substantial portion of preference-labeling data falls into cases of principle conflict or indifference, where the principle set does not uniquely determine a decision. We then draw an operational consequence: because such judgments are expressed in behavior, many alignment-relevant choices appear only in the distribution of responses a model generates at deployment time. To formalize this point, we distinguish deployment-induced and corpus-induced evaluation and show that off-policy audits can fail to capture alignment-relevant failures when the two response distributions differ. We argue that principle-specified alignment includes a context-dependent interpretive component.
CVFeb 4, 2020Code
Privacy-Preserving Image Sharing via Sparsifying Layers on Convolutional GroupsSohrab Ferdowsi, Behrooz Razeghi, Taras Holotyak et al.
We propose a practical framework to address the problem of privacy-aware image sharing in large-scale setups. We argue that, while compactness is always desired at scale, this need is more severe when trying to furthermore protect the privacy-sensitive content. We therefore encode images, such that, from one hand, representations are stored in the public domain without paying the huge cost of privacy protection, but ambiguated and hence leaking no discernible content from the images, unless a combinatorially-expensive guessing mechanism is available for the attacker. From the other hand, authorized users are provided with very compact keys that can easily be kept secure. This can be used to disambiguate and reconstruct faithfully the corresponding access-granted images. We achieve this with a convolutional autoencoder of our design, where feature maps are passed independently through sparsifying transformations, providing multiple compact codes, each responsible for reconstructing different attributes of the image. The framework is tested on a large-scale database of images with public implementation available.
LGApr 3, 2024
Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face RecognitionBehrooz Razeghi, Parsa Rahimi, Sébastien Marcel
In this study, we apply the information-theoretic Privacy Funnel (PF) model to the domain of face recognition, developing a novel method for privacy-preserving representation learning within an end-to-end training framework. Our approach addresses the trade-off between obfuscation and utility in data protection, quantified through logarithmic loss, also known as self-information loss. This research provides a foundational exploration into the integration of information-theoretic privacy principles with representation learning, focusing specifically on the face recognition systems. We particularly highlight the adaptability of our framework with recent advancements in face recognition networks, such as AdaFace and ArcFace. In addition, we introduce the Generative Privacy Funnel ($\mathsf{GenPF}$) model, a paradigm that extends beyond the traditional scope of the PF model, referred to as the Discriminative Privacy Funnel ($\mathsf{DisPF}$). This $\mathsf{GenPF}$ model brings new perspectives on data generation methods with estimation-theoretic and information-theoretic privacy guarantees. Complementing these developments, we also present the deep variational PF (DVPF) model. This model proposes a tractable variational bound for measuring information leakage, enhancing the understanding of privacy preservation challenges in deep representation learning. The DVPF model, associated with both $\mathsf{DisPF}$ and $\mathsf{GenPF}$ models, sheds light on connections with various generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Diffusion models. Complementing our theoretical contributions, we release a reproducible PyTorch package, facilitating further exploration and application of these privacy-preserving methodologies in face recognition systems.
CVJan 26, 2024
Deep Variational Privacy Funnel: General Modeling with Applications in Face RecognitionBehrooz Razeghi, Parsa Rahimi, Sébastien Marcel
In this study, we harness the information-theoretic Privacy Funnel (PF) model to develop a method for privacy-preserving representation learning using an end-to-end training framework. We rigorously address the trade-off between obfuscation and utility. Both are quantified through the logarithmic loss, a measure also recognized as self-information loss. This exploration deepens the interplay between information-theoretic privacy and representation learning, offering substantive insights into data protection mechanisms for both discriminative and generative models. Importantly, we apply our model to state-of-the-art face recognition systems. The model demonstrates adaptability across diverse inputs, from raw facial images to both derived or refined embeddings, and is competent in tasks such as classification, reconstruction, and generation.
LGJun 5, 2021
Variational Leakage: The Role of Information Complexity in Privacy LeakageAmir Ahooye Atashin, Behrooz Razeghi, Deniz Gündüz et al.
We study the role of information complexity in privacy leakage about an attribute of an adversary's interest, which is not known a priori to the system designer. Considering the supervised representation learning setup and using neural networks to parameterize the variational bounds of information quantities, we study the impact of the following factors on the amount of information leakage: information complexity regularizer weight, latent space dimension, the cardinalities of the known utility and unknown sensitive attribute sets, the correlation between utility and sensitive attributes, and a potential bias in a sensitive attribute of adversary's interest. We conduct extensive experiments on Colored-MNIST and CelebA datasets to evaluate the effect of information complexity on the amount of intrinsic leakage.
IRFeb 8, 2021
Privacy-Preserving Near Neighbor Search via Sparse Coding with AmbiguationBehrooz Razeghi, Sohrab Ferdowsi, Dimche Kostadinov et al.
In this paper, we propose a framework for privacy-preserving approximate near neighbor search via stochastic sparsifying encoding. The core of the framework relies on sparse coding with ambiguation (SCA) mechanism that introduces the notion of inherent shared secrecy based on the support intersection of sparse codes. This approach is `fairness-aware', in the sense that any point in the neighborhood has an equiprobable chance to be chosen. Our approach can be applied to raw data, latent representation of autoencoders, and aggregated local descriptors. The proposed method is tested on both synthetic i.i.d data and real large-scale image databases.
ITSep 9, 2020
On Perfect Obfuscation: Local Information Geometry AnalysisBehrooz Razeghi, Flavio. P. Calmon, Deniz Gunduz et al.
We consider the problem of privacy-preserving data release for a specific utility task under perfect obfuscation constraint. We establish the necessary and sufficient condition to extract features of the original data that carry as much information about a utility attribute as possible, while not revealing any information about the sensitive attribute. This problem formulation generalizes both the information bottleneck and privacy funnel problems. We adopt a local information geometry analysis that provides useful insight into information coupling and trajectory construction of spherical perturbation of probability mass functions. This analysis allows us to construct the modal decomposition of the joint distributions, divergence transfer matrices, and mutual information. By decomposing the mutual information into orthogonal modes, we obtain the locally sufficient statistics for inferences about the utility attribute, while satisfying perfect obfuscation constraint. Furthermore, we develop the notion of perfect obfuscation based on $χ^2$-divergence and Kullback-Leibler divergence in the Euclidean information geometry.
ITJul 15, 2019
Single-Component Privacy Guarantees in Helper Data Systems and Sparse Coding with AmbiguationBehrooz Razeghi, Taras Stanko, Boris Škorić et al.
We investigate the privacy of two approaches to (biometric) template protection: Helper Data Systems and Sparse Ternary Coding with Ambiguization. In particular, we focus on a privacy property that is often overlooked, namely how much leakage exists about one specific binary property of one component of the feature vector. This property is e.g. the sign or an indicator that a threshold is exceeded. We provide evidence that both approaches are able to protect such sensitive binary variables, and discuss how system parameters need to be set.
LGMay 8, 2019
Reconstruction of Privacy-Sensitive Data from Protected TemplatesShideh Rezaeifar, Behrooz Razeghi, Olga Taran et al.
In this paper, we address the problem of data reconstruction from privacy-protected templates, based on recent concept of sparse ternary coding with ambiguization (STCA). The STCA is a generalization of randomization techniques which includes random projections, lossy quantization, and addition of ambiguization noise to satisfy the privacy-utility trade-off requirements. The theoretical privacy-preserving properties of STCA have been validated on synthetic data. However, the applicability of STCA to real data and potential threats linked to reconstruction based on recent deep reconstruction algorithms are still open problems. Our results demonstrate that STCA still achieves the claimed theoretical performance when facing deep reconstruction attacks for the synthetic i.i.d. data, while for real images special measures are required to guarantee proper protection of the templates.
LGJan 30, 2019
Clustering with Jointly Learned Nonlinear Transforms Over Discriminating Min-Max Similarity/Dissimilarity AssignmentDimche Kostadinov, Behrooz Razeghi, Taras Holotyak et al.
This paper presents a novel clustering concept that is based on jointly learned nonlinear transforms (NTs) with priors on the information loss and the discrimination. We introduce a clustering principle that is based on evaluation of a parametric min-max measure for the discriminative prior. The decomposition of the prior measure allows to break down the assignment into two steps. In the first step, we apply NTs to a data point in order to produce candidate NT representations. In the second step, we preform the actual assignment by evaluating the parametric measure over the candidate NT representations. Numerical experiments on image clustering task validate the potential of the proposed approach. The evaluation shows advantages in comparison to the state-of-the-art clustering methods.
CRDec 10, 2018
Aggregation and Embedding for Group Membership VerificationMarzieh Gheisari, Teddy Furon, Laurent Amsaleg et al.
This paper proposes a group membership verification protocol preventing the curious but honest server from reconstructing the enrolled signatures and inferring the identity of querying clients. The protocol quantizes the signatures into discrete embeddings, making reconstruction difficult. It also aggregates multiple embeddings into representative values, impeding identification. Theoretical and experimental results show the trade-off between the security and the error rates.
ITJun 7, 2018
Privacy-Preserving Identification via Layered Sparse Code Design: Distributed Servers and Multiple Access AuthorizationBehrooz Razeghi, Slava Voloshynovskiy, Sohrab Ferdowsi et al.
We propose a new computationally efficient privacy-preserving identification framework based on layered sparse coding. The key idea of the proposed framework is a sparsifying transform learning with ambiguization, which consists of a trained linear map, a component-wise nonlinearity and a privacy amplification. We introduce a practical identification framework, which consists of two phases: public and private identification. The public untrusted server provides the fast search service based on the sparse privacy protected codebook stored at its side. The private trusted server or the local client application performs the refined accurate similarity search using the results of the public search and the layered sparse codebooks stored at its side. The private search is performed in the decoded domain and also the accuracy of private search is chosen based on the authorization level of the client. The efficiency of the proposed method is in computational complexity of encoding, decoding, "encryption" (ambiguization) and "decryption" (purification) as well as storage complexity of the codebooks.
LGMay 20, 2018
Network Learning with Local PropagationDimche Kostadinov, Behrooz Razeghi, Sohrab Ferdowsi et al.
This paper presents a locally decoupled network parameter learning with local propagation. Three elements are taken into account: (i) sets of nonlinear transforms that describe the representations at all nodes, (ii) a local objective at each node related to the corresponding local representation goal, and (iii) a local propagation model that relates the nonlinear error vectors at each node with the goal error vectors from the directly connected nodes. The modeling concepts (i), (ii) and (iii) offer several advantages, including (a) a unified learning principle for any network that is represented as a graph, (b) understanding and interpretation of the local and the global learning dynamics, (c) decoupled and parallel parameter learning, (d) a possibility for learning in infinitely long, multi-path and multi-goal networks. Numerical experiments validate the potential of the learning principle. The preliminary results show advantages in comparison to the state-of-the-art methods, w.r.t. the learning time and the network size while having comparable recognition accuracy.
CRSep 29, 2017
Privacy Preserving Identification Using Sparse Approximation with AmbiguizationBehrooz Razeghi, Slava Voloshynovskiy, Dimche Kostadinov et al.
In this paper, we consider a privacy preserving encoding framework for identification applications covering biometrics, physical object security and the Internet of Things (IoT). The proposed framework is based on a sparsifying transform, which consists of a trained linear map, an element-wise nonlinearity, and privacy amplification. The sparsifying transform and privacy amplification are not symmetric for the data owner and data user. We demonstrate that the proposed approach is closely related to sparse ternary codes (STC), a recent information-theoretic concept proposed for fast approximate nearest neighbor (ANN) search in high dimensional feature spaces that being machine learning in nature also offers significant benefits in comparison to sparse approximation and binary embedding approaches. We demonstrate that the privacy of the database outsourced to a server as well as the privacy of the data user are preserved at a low computational cost, storage and communication burdens.
NIAug 12, 2014
BSROne: Binary Search with Routing of O(1); A Scalable Circular Design for Distributed NetworksAlireza Naghizadeh, Tahereh Yourdkhani, Behrooz Razeghi et al.
Peer-to-Peer (P2P) networks as distributed solutions are used in a variety of applications. Based on the type of routing for queries among their nodes, they are classified into three groups: structured, unstructured and small-world P2P networks. Each of these categories has its own applications and benefits. Structured networks by using Distributed Hash Tables (DHT) can forward request search queries more efficiently. These networks usually organize a specific topology and make a geometrical shape. A circular topology is a prevalent design which was first introduced by Chord. In this paper, we propose BSROne, a circular structured P2P design which attempts to consider several shortcomings in the current networks. In our proposed method, we want to achieve O(1) routing time without requiring all of the nodes to know about each other. By removing the real connections between nodes and tying all of them with super-nodes, we could reduce the number of overheads that are essential to maintain the connectivity between nodes in such networks. Furthermore, we gave the network an ability to scale up by introducing one layer above super-nodes. We achieved this by emulating the design of binary search algorithm for supreme-nodes. In this paper, at first we introduce a design where fixed super-nodes with unlimited resources are given to the distributed network. In the next step, we explain how it can manage to work as a P2P application. We finally discuss the possibility of removing the scalability issue in a P2P environment for our design.