LGJul 4, 2023
Text + Sketch: Image Compression at Ultra Low RatesEric Lei, Yiğit Berkay Uslu, Hamed Hassani et al.
Recent advances in text-to-image generative models provide the ability to generate high-quality images from short text descriptions. These foundation models, when pre-trained on billion-scale datasets, are effective for various downstream tasks with little or no further training. A natural question to ask is how such models may be adapted for image compression. We investigate several techniques in which the pre-trained models can be directly used to implement compression schemes targeting novel low rate regimes. We show how text descriptions can be used in conjunction with side information to generate high-fidelity reconstructions that preserve both semantics and spatial structure of the original. We demonstrate that at very low bit-rates, our method can significantly improve upon learned compressors in terms of perceptual and semantic fidelity, despite no end-to-end training.
ITApr 4, 2022
Neural Estimation of the Rate-Distortion Function With Applications to Operational Source CodingEric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
A fundamental question in designing lossy data compression schemes is how well one can do in comparison with the rate-distortion function, which describes the known theoretical limits of lossy compression. Motivated by the empirical success of deep neural network (DNN) compressors on large, real-world data, we investigate methods to estimate the rate-distortion function on such data, which would allow comparison of DNN compressors with optimality. While one could use the empirical distribution of the data and apply the Blahut-Arimoto algorithm, this approach presents several computational challenges and inaccuracies when the datasets are large and high-dimensional, such as the case of modern image datasets. Instead, we re-formulate the rate-distortion objective, and solve the resulting functional optimization problem using neural networks. We apply the resulting rate-distortion estimator, called NERD, on popular image datasets, and provide evidence that NERD can accurately estimate the rate-distortion function. Using our estimate, we show that the rate-distortion achievable by DNN compressors are within several bits of the rate-distortion function for real-world datasets. Additionally, NERD provides access to the rate-distortion achieving channel, as well as samples from its output marginal. Therefore, using recent results in reverse channel coding, we describe how NERD can be used to construct an operational one-shot lossy compression scheme with guarantees on the achievable rate and distortion. Experimental results demonstrate competitive performance with DNN compressors.
ITJul 1, 2023
On a Relation Between the Rate-Distortion Function and Optimal TransportEric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
We discuss a relationship between rate-distortion and optimal transport (OT) theory, even though they seem to be unrelated at first glance. In particular, we show that a function defined via an extremal entropic OT distance is equivalent to the rate-distortion function. We numerically verify this result as well as previous results that connect the Monge and Kantorovich problems to optimal scalar quantization. Thus, we unify solving scalar quantization and rate-distortion functions in an alternative fashion by using their respective optimal transport solvers.
CVAug 29, 2023
WrappingNet: Mesh Autoencoder via Deep Sphere DeformationEric Lei, Muhammad Asad Lodhi, Jiahao Pang et al.
There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific templates, e.g., human face/body templates. It restricts the learned latent codes to only be meaningful for objects in a specific category, so the learned latent spaces are unable to be used across different types of objects. In this work, we present WrappingNet, the first mesh autoencoder enabling general mesh unsupervised learning over heterogeneous objects. It introduces a novel base graph in the bottleneck dedicated to representing mesh connectivity, which is shown to facilitate learning a shared latent space representing object shape. The superiority of WrappingNet mesh learning is further demonstrated via improved reconstruction quality and competitive classification compared to point cloud learning, as well as latent interpolation between meshes of different categories.
LGOct 15, 2023
Score-Based Methods for Discrete Optimization in Deep LearningEric Lei, Arman Adibi, Hamed Hassani
Discrete optimization problems often arise in deep learning tasks, despite the fact that neural networks typically operate on continuous data. One class of these problems involve objective functions which depend on neural networks, but optimization variables which are discrete. Although the discrete optimization literature provides efficient algorithms, they are still impractical in these settings due to the high cost of an objective function evaluation, which involves a neural network forward-pass. In particular, they require $O(n)$ complexity per iteration, but real data such as point clouds have values of $n$ in thousands or more. In this paper, we investigate a score-based approximation framework to solve such problems. This framework uses a score function as a proxy for the marginal gain of the objective, leveraging embeddings of the discrete variables and speed of auto-differentiation frameworks to compute backward-passes in parallel. We experimentally demonstrate, in adversarial set classification tasks, that our method achieves a superior trade-off in terms of speed and solution quality compared to heuristic methods.
ITMar 12, 2024
Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform CodingEric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
Neural compression has brought tremendous progress in designing lossy compressors with good rate-distortion (RD) performance at low complexity. Thus far, neural compression design involves transforming the source to a latent vector, which is then rounded to integers and entropy coded. While this approach has been shown to be optimal on a few specific sources, we show that it can be highly sub-optimal on synthetic sources whose intrinsic dimensionality is greater than one. With integer rounding in the latent space, the quantization regions induced by neural transformations, remain square-like and fail to match those of optimal vector quantization. We demonstrate that this phenomenon is due to the choice of scalar quantization in the latent space, and not the transform design. By employing lattice quantization instead, we propose Lattice Transform Coding (LTC) and show that it approximately recovers optimal vector quantization at reasonable complexity. On real-world sources, LTC improves upon standard neural compressors. LTC also provides a framework that can integrate structurally (near) optimal information-theoretic designs into lossy compression; examples include block coding, which yields coding gain over optimal one-shot coding and approaches the asymptotically-achievable rate-distortion function, as well as nested lattice quantization for low complexity fixed-rate coding.
ITMar 21, 2025
Optimal Neural Compressors for the Rate-Distortion-Perception TradeoffEric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
Recent efforts in neural compression have focused on the rate-distortion-perception (RDP) tradeoff, where the perception constraint ensures the source and reconstruction distributions are close in terms of a statistical divergence. Theoretical work on RDP describes properties of RDP-optimal compressors without providing constructive and low complexity solutions. While classical rate distortion theory shows that optimal compressors should efficiently pack space, RDP theory additionally shows that infinite randomness shared between the encoder and decoder may be necessary for RDP optimality. In this paper, we propose neural compressors that are low complexity and benefit from high packing efficiency through lattice coding and shared randomness through shared dithering over the lattice cells. For two important settings, namely infinite shared and zero shared randomness, we analyze the RDP tradeoff achieved by our proposed neural compressors and show optimality in both cases. Experimentally, we investigate the roles that these two components of our design, lattice coding and randomness, play in the performance of neural compressors on synthetic and real-world data. We observe that performance improves with more shared randomness and better lattice packing.
LGMay 25, 2023
Federated Neural Compression Under Heterogeneous DataEric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
We discuss a federated learned compression problem, where the goal is to learn a compressor from real-world data which is scattered across clients and may be statistically heterogeneous, yet share a common underlying representation. We propose a distributed source model that encompasses both characteristics, and naturally suggests a compressor architecture that uses analysis and synthesis transforms shared by clients. Inspired by personalized federated learning methods, we employ an entropy model that is personalized to each client. This allows for a global latent space to be learned across clients, and personalized entropy models that adapt to the clients' latent distributions. We show empirically that this strategy outperforms solely local methods, which indicates that learned compression also benefits from a shared global representation in statistically heterogeneous federated settings.
LGDec 14, 2021
Robust Graph Neural Networks via Probabilistic Lipschitz ConstraintsRaghu Arghal, Eric Lei, Shirin Saeedi Bidokhti
Graph neural networks (GNNs) have recently been demonstrated to perform well on a variety of network-based tasks such as decentralized control and resource allocation, and provide computationally efficient methods for these tasks which have traditionally been challenging in that regard. However, like many neural-network based systems, GNNs are susceptible to shifts and perturbations on their inputs, which can include both node attributes and graph structure. In order to make them more useful for real-world applications, it is important to ensure their robustness post-deployment. Motivated by controlling the Lipschitz constant of GNN filters with respect to the node attributes, we propose to constrain the frequency response of the GNN's filter banks. We extend this formulation to the dynamic graph setting using a continuous frequency response constraint, and solve a relaxed variant of the problem via the scenario approach. This allows for the use of the same computationally efficient algorithm on sampled constraints, which provides PAC-style guarantees on the stability of the GNN using results in scenario optimization. We also highlight an important connection between this setup and GNN stability to graph perturbations, and provide experimental results which demonstrate the efficacy and broadness of our approach.
LGOct 13, 2021
Out-of-Distribution Robustness in Deep Learning CompressionEric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
In recent years, deep neural network (DNN) compression systems have proved to be highly effective for designing source codes for many natural sources. However, like many other machine learning systems, these compressors suffer from vulnerabilities to distribution shifts as well as out-of-distribution (OOD) data, which reduces their real-world applications. In this paper, we initiate the study of OOD robust compression. Considering robustness to two types of ambiguity sets (Wasserstein balls and group shifts), we propose algorithmic and architectural frameworks built on two principled methods: one that trains DNN compressors using distributionally-robust optimization (DRO), and the other which uses a structured latent code. Our results demonstrate that both methods enforce robustness compared to a standard DNN compressor, and that using a structured code can be superior to the DRO compressor. We observe tradeoffs between robustness and distortion and corroborate these findings theoretically for a specific class of sources.
LGSep 29, 2019
Siamese Neural Networks for Wireless Positioning and Channel ChartingEric Lei, Oscar Castañeda, Olav Tirkkonen et al.
Neural networks have been proposed recently for positioning and channel charting of user equipments (UEs) in wireless systems. Both of these approaches process channel state information (CSI) that is acquired at a multi-antenna base-station in order to learn a function that maps CSI to location information. CSI-based positioning using deep neural networks requires a dataset that contains both CSI and associated location information. Channel charting (CC) only requires CSI information to extract relative position information. Since CC builds on dimensionality reduction, it can be implemented using autoencoders. In this paper, we propose a unified architecture based on Siamese networks that can be used for supervised UE positioning and unsupervised channel charting. In addition, our framework enables semisupervised positioning, where only a small set of location information is available during training. We use simulations to demonstrate that Siamese networks achieve similar or better performance than existing positioning and CC approaches with a single, unified neural network architecture.
LGApr 28, 2018
Novel Prediction Techniques Based on Clusterwise Linear RegressionIgor Gitman, Jieshi Chen, Eric Lei et al.
In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a reasonable way to obtain CLR cluster labels when the values of target variable are unknown. In this paper we propose two novel approaches on how to solve this problem. The first approach, predictive CLR builds a separate classification model to predict test CLR labels. The second approach, constrained CLR utilizes a set of user-specified constraints that enforce certain points to go to the same clusters. Assuming the constraint values are known for the test points, they can be directly used to assign CLR labels. We evaluate these two approaches on three UCI ML datasets as well as on a large corpus of health insurance claims. We show that both of the proposed algorithms significantly improve over the known CLR-based regression methods. Moreover, predictive CLR consistently outperforms linear regression and random forest, and shows comparable performance to support vector regression on UCI ML datasets. The constrained CLR approach achieves the best performance on the health insurance dataset, while enjoying only $\approx 20$ times increased computational time over linear regression.
MLSep 17, 2017
Characterization of Hemodynamic Signal by Learning Multi-View RelationshipsEric Lei, Kyle Miller, Michael R. Pinsky et al.
Multi-view data are increasingly prevalent in practice. It is often relevant to analyze the relationships between pairs of views by multi-view component analysis techniques such as Canonical Correlation Analysis (CCA). However, data may easily exhibit nonlinear relations, which CCA cannot reveal. We aim to investigate the usefulness of nonlinear multi-view relations to characterize multi-view data in an explainable manner. To address this challenge, we propose a method to characterize globally nonlinear multi-view relationships as a mixture of linear relationships. A clustering method, it identifies partitions of observations that exhibit the same relationships and learns those relationships simultaneously. It defines cluster variables by multi-view rather than spatial relationships, unlike almost all other clustering methods. Furthermore, we introduce a supervised classification method that builds on our clustering method by employing multi-view relationships as discriminative factors. The value of these methods resides in their capability to find useful structure in the data that single-view or current multi-view methods may struggle to find. We demonstrate the potential utility of the proposed approach using an application in clinical informatics to detect and characterize slow bleeding in patients whose central venous pressure (CVP) is monitored at the bedside. Presently, CVP is considered an insensitive measure of a subject's intravascular volume status or its change. However, we reason that features of CVP during inspiration and expiration should be informative in early identification of emerging changes of patient status. We empirically show how the proposed method can help discover and analyze multiple-to-multiple correlations, which could be nonlinear or vary throughout the population, by finding explainable structure of operational interest to practitioners.