Yair Weiss

h-index52

25papers

28,812citations

Novelty54%

AI Score49

Ranked #25,096 of 194,257 authors (top 13%)#9,035 in CV (top 15%)

25 Papers

16.3CVMar 22, 2022Code

Generating natural images with direct Patch Distributions Matching

Ariel Elnekave, Yair Weiss

Many traditional computer vision algorithms generate realistic images by requiring that each patch in the generated image be similar to a patch in a training image and vice versa. Recently, this classical approach has been replaced by adversarial training with a patch discriminator. The adversarial approach avoids the computational burden of finding nearest neighbors of patches but often requires very long training times and may fail to match the distribution of patches. In this paper we leverage the recently developed Sliced Wasserstein Distance and develop an algorithm that explicitly and efficiently minimizes the distance between patch distributions in two images. Our method is conceptually simple, requires no training and can be implemented in a few lines of codes. On a number of image generation tasks we show that our results are often superior to single-image-GANs, require no training, and can generate high quality images in a few seconds. Our implementation is available at https://github.com/ariel415el/GPDM

1.4CVJun 6, 2022Code

What do CNNs Learn in the First Layer and Why? A Linear Systems Perspective

Rhea Chowers, Yair Weiss

It has previously been reported that the representation that is learned in the first layer of deep Convolutional Neural Networks (CNNs) is highly consistent across initializations and architectures. In this work, we quantify this consistency by considering the first layer as a filter bank and measuring its energy distribution. We find that the energy distribution is very different from that of the initial weights and is remarkably consistent across random initializations, datasets, architectures and even when the CNNs are trained with random labels. In order to explain this consistency, we derive an analytical formula for the energy profile of linear CNNs and show that this profile is mostly dictated by the second order statistics of image patches in the training set and it will approach a whitening transformation when the number of iterations goes to infinity. Finally, we show that this formula for linear CNNs also gives an excellent fit for the energy profiles learned by commonly used nonlinear CNNs such as ResNet and VGG, and that the first layer of these CNNs indeed perform approximate whitening of their inputs.

8.0CVMar 30

Is the Modality Gap a Bug or a Feature? A Robustness Perspective

Rhea Chowers, Oshri Naparstek, Udi Barzelay et al.

Many modern multi-modal models (e.g. CLIP) seek an embedding space in which the two modalities are aligned. Somewhat surprisingly, almost all existing models show a strong modality gap: the distribution of images is well-separated from the distribution of texts in the shared embedding space. Despite a series of recent papers on this topic, it is still not clear why this gap exists nor whether closing the gap in post-processing will lead to better performance on downstream tasks. In this paper we show that under certain conditions, minimizing the contrastive loss yields a representation in which the two modalities are separated by a global gap vector that is orthogonal to their embeddings. We also show that under these conditions the modality gap is monotonically related to robustness: decreasing the gap does not change the clean accuracy of the models but makes it less likely that a model will change its output when the embeddings are perturbed. Our experiments show that for many real-world VLMs we can significantly increase robustness by a simple post-processing step that moves one modality towards the mean of the other modality, without any loss of clean accuracy.

8.7CVJul 8

A Theory of Contrastive Learning with Natural Images

Antonio Torralba, Yair Weiss

Why does contrastive learning with simple images and augmentations yield useful representations for downstream tasks? We address this question by analytically computing the optimal representation in terms of a contrastive loss for a range of basic augmentations and any image dataset with stationary statistics. We show that for certain augmentations the optimum can be attained by a CNN whose first layer filters are sinusoids, followed by a pointwise nonlinearity, global average pooling, and a final linear layer that performs partial whitening. We also show that the optimal weights in such CNNs for more complicated augmentations are still sinusoids. The frequencies of the sinusoids and their weights can be computed using a simple waterfilling algorithm given the dataset's expected power spectrum. Experiments with different image datasets and augmentations show that such CNNs trained with SGD empirically learn sinusoids in their first layer and to perform partial whitening

2.6LGApr 9, 2024Code

On adversarial training and the 1 Nearest Neighbor classifier

Amir Hagai, Yair Weiss

The ability to fool deep learning classifiers with tiny perturbations of the input has lead to the development of adversarial training in which the loss with respect to adversarial examples is minimized in addition to the training examples. While adversarial training improves the robustness of the learned classifiers, the procedure is computationally expensive, sensitive to hyperparameters and may still leave the classifier vulnerable to other types of small perturbations. In this paper we compare the performance of adversarial training to that of the simple 1 Nearest Neighbor (1NN) classifier. We prove that under reasonable assumptions, the 1NN classifier will be robust to {\em any} small image perturbation of the training images. In experiments with 135 different binary image classification problems taken from CIFAR10, MNIST and Fashion-MNIST we find that 1NN outperforms TRADES (a powerful adversarial training algorithm) in terms of average adversarial accuracy. In additional experiments with 69 robust models taken from the current adversarial robustness leaderboard, we find that 1NN outperforms almost all of them in terms of robustness to perturbations that are only slightly different from those used during training. Taken together, our results suggest that modern adversarial training methods still fall short of the robustness of the simple 1NN classifier. our code can be found at \url{https://github.com/amirhagai/On-Adversarial-Training-And-The-1-Nearest-Neighbor-Classifier} \keywords{Adversarial training}

7.2CVJun 27

What Color is the Sky (for a non-human) ?

Yair Weiss, Ofer Springer

The light of the daytime sky contains a mixture of many colors yet is perceived as blue by human observers. This is largely due to the particular response functions of the human cones. Under these response functions skylight and blue light are metamers: they yield the exact same excitation of the cones. In this paper we ask: is it possible to define the ``color'' of the sky for other visual systems? We present a simple computational method to determine monochromatic metamers to a given input light for arbitrary visual systems. Using published values on spectral sensitivity functions of various species, we use our method to determine the dominant wavelength of monochromatic metamers to skylight. For a wide range of species (bichromats, trichromats and tetrachormats) we find monochromatic metamers to skylight but the dominant wavelength of the metamer can vary drastically between species and be very different from the color perceived by humans.

4.1LGMar 13, 2025Code

Characterizing Nonlinear Dynamics via Smooth Prototype Equivalences

Roy Friedman, Noa Moriel, Matthew Ricci et al.

Characterizing dynamical systems given limited measurements is a common challenge throughout the physical and biological sciences. However, this task is challenging, especially due to transient variability in systems with equivalent long-term dynamics. We address this by introducing smooth prototype equivalences (SPE), a framework that fits a diffeomorphism using normalizing flows to distinct prototypes - simplified dynamical systems that define equivalence classes of behavior. SPE enables classification by comparing the deformation loss of the observed sparse, high-dimensional measurements to the prototype dynamics. Furthermore, our approach enables estimation of the invariant sets of the observed dynamics through the learned mapping from prototype space to data space. Our method outperforms existing techniques in the classification of oscillatory systems and can efficiently identify invariant structures like limit cycles and fixed points in an equation-free manner, even when only a small, noisy subset of the phase space is observed. Finally, we show how our method can be used for the detection of biological processes like the cell cycle trajectory from high-dimensional single-cell gene expression data.

5.2CVApr 10, 2024

Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations

Ofir Shifman, Yair Weiss

Deep neural networks that achieve remarkable performance in image classification have previously been shown to be easily fooled by tiny transformations such as a one pixel translation of the input image. In order to address this problem, two approaches have been proposed in recent years. The first approach suggests using huge datasets together with data augmentation in the hope that a highly varied training set will teach the network to learn to be invariant. The second approach suggests using architectural modifications based on sampling theory to deal explicitly with image translations. In this paper, we show that these approaches still fall short in robustly handling 'natural' image translations that simulate a subtle change in camera orientation. Our findings reveal that a mere one-pixel translation can result in a significant change in the predicted image representation for approximately 40% of the test images in state-of-the-art models (e.g. open-CLIP trained on LAION-2B or DINO-v2) , while models that are explicitly constructed to be robust to cyclic translations can still be fooled with 1 pixel realistic (non-cyclic) translations 11% of the time. We present Robust Inference by Crop Selection: a simple method that can be proven to achieve any desired level of consistency, although with a modest tradeoff with the model's accuracy. Importantly, we demonstrate how employing this method reduces the ability to fool state-of-the-art models with a 1 pixel translation to less than 5% while suffering from only a 1% drop in classification accuracy. Additionally, we show that our method can be easy adjusted to deal with circular shifts as well. In such case we achieve 100% robustness to integer shifts with state-of-the-art accuracy, and with no need for any further training.

2.6LGFeb 21, 2024

Intriguing Properties of Modern GANs

Roy Friedman, Yair Weiss

Modern GANs achieve remarkable performance in terms of generating realistic and diverse samples. This has led many to believe that ``GANs capture the training data manifold''. In this work we show that this interpretation is wrong. We empirically show that the manifold learned by modern GANs does not fit the training distribution: specifically the manifold does not pass through the training examples and passes closer to out-of-distribution images than to in-distribution images. We also investigate the distribution over images implied by the prior over the latent codes and study whether modern GANs learn a density that approximates the training distribution. Surprisingly, we find that the learned density is very far from the data distribution and that GANs tend to assign higher density to out-of-distribution images. Finally, we demonstrate that the set of images used to train modern GANs are often not part of the typical set described by the GANs' distribution.

6.5CVApr 20, 2021Code

Posterior Sampling for Image Restoration using Explicit Patch Priors

Roy Friedman, Yair Weiss

Almost all existing methods for image restoration are based on optimizing the mean squared error (MSE), even though it is known that the best estimate in terms of MSE may yield a highly atypical image due to the fact that there are many plausible restorations for a given noisy image. In this paper, we show how to combine explicit priors on patches of natural images in order to sample from the posterior probability of a full image given a degraded image. We prove that our algorithm generates correct samples from the distribution $p(x|y) \propto \exp(-E(x|y))$ where $E(x|y)$ is the cost function minimized in previous patch-based approaches that compute a single restoration. Unlike previous approaches that computed a single restoration using MAP or MMSE, our method makes explicit the uncertainty in the restored images and guarantees that all patches in the restored images will be typical given the patch prior. Unlike previous approaches that used implicit priors on fixed-size images, our approach can be used with images of any size. Our experimental results show that posterior sampling using patch priors yields images of high perceptual quality and high PSNR on a range of challenging image restoration problems.

7.2CVJul 24, 2020Code

The Surprising Effectiveness of Linear Unsupervised Image-to-Image Translation

Eitan Richardson, Yair Weiss

Unsupervised image-to-image translation is an inherently ill-posed problem. Recent methods based on deep encoder-decoder architectures have shown impressive results, but we show that they only succeed due to a strong locality bias, and they fail to learn very simple nonlocal transformations (e.g. mapping upside down faces to upright faces). When the locality bias is removed, the methods are too powerful and may fail to learn simple local transformations. In this paper we introduce linear encoder-decoder architectures for unsupervised image to image translation. We show that learning is much easier and faster with these architectures and yet the results are surprisingly effective. In particular, we show a number of local problems for which the results of the linear methods are comparable to those of state-of-the-art architectures but with a fraction of the training time, and a number of nonlocal problems for which the state-of-the-art fails while linear methods succeed.

10.6LGFeb 20, 2020

A Bayes-Optimal View on Adversarial Examples

Eitan Richardson, Yair Weiss

Since the discovery of adversarial examples - the ability to fool modern CNN classifiers with tiny perturbations of the input, there has been much discussion whether they are a "bug" that is specific to current neural architectures and training methods or an inevitable "feature" of high dimensional geometry. In this paper, we argue for examining adversarial examples from the perspective of Bayes-Optimal classification. We construct realistic image datasets for which the Bayes-Optimal classifier can be efficiently computed and derive analytic conditions on the distributions under which these classifiers are provably robust against any adversarial attack even in high dimensions. Our results show that even when these "gold standard" optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier, indicating that adversarial examples are often an avoidable "bug". We further show that RBF SVMs trained on the same data consistently learn a robust classifier. The same trend is observed in experiments with real images in different datasets.

24.9CVMay 31, 2018Code

On GANs and GMMs

Eitan Richardson, Yair Weiss

A longstanding problem in machine learning is to find unsupervised methods that can learn the statistical structure of high dimensional signals. In recent years, GANs have gained much attention as a possible solution to the problem, and in particular have shown the ability to generate remarkably realistic high resolution sampled images. At the same time, many authors have pointed out that GANs may fail to model the full distribution ("mode collapse") and that using the learned models for anything other than generating samples may be very difficult. In this paper, we examine the utility of GANs in learning statistical models of images by comparing them to perhaps the simplest statistical model, the Gaussian Mixture Model. First, we present a simple method to evaluate generative models based on relative proportions of samples that fall into predetermined bins. Unlike previous automatic methods for evaluating models, our method does not rely on an additional neural network nor does it require approximating intractable computations. Second, we compare the performance of GANs to GMMs trained on the same datasets. While GMMs have previously been shown to be successful in modeling small patches of images, we show how to train them on full sized images despite the high dimensionality. Our results show that GMMs can generate realistic samples (although less sharp than those of GANs) but also capture the full distribution, which GANs fail to do. Furthermore, GMMs allow efficient inference and explicit representation of the underlying statistical structure. Finally, we discuss how GMMs can be used to generate sharp images.

37.7CVMay 30, 2018Code

Why do deep convolutional networks generalize so poorly to small image transformations?

Aharon Azulay, Yair Weiss

Convolutional Neural Networks (CNNs) are commonly assumed to be invariant to small image transformations: either because of the convolutional architecture or because they were trained using data augmentation. Recently, several authors have shown that this is not the case: small translations or rescalings of the input image can drastically change the network's prediction. In this paper, we quantify this phenomena and ask why neither the convolutional architecture nor data augmentation are sufficient to achieve the desired invariance. Specifically, we show that the convolutional architecture does not give invariance since architectures ignore the classical sampling theorem, and data augmentation does not give invariance because the CNNs learn to be invariant to transformations only for images that are very similar to typical images from the training set. We discuss two possible solutions to this problem: (1) antialiasing the intermediate representations and (2) increasing data augmentation and show that they provide only a partial solution at best. Taken together, our results indicate that the problem of insuring invariance to small image transformations in neural networks while preserving high accuracy remains unsolved.

1.7CVFeb 20, 2017Code

Reflection Separation Using Guided Annotation

Ofer Springer, Yair Weiss

Photographs taken through a glass surface often contain an approximately linear superposition of reflected and transmitted layers. Decomposing an image into these layers is generally an ill-posed task and the use of an additional image prior and user provided cues is presently necessary in order to obtain good results. Current annotation approaches rely on a strong sparsity assumption. For images with significant texture this assumption does not typically hold, thus rendering the annotation process unviable. In this paper we show that using a Gaussian Mixture Model patch prior, the correct local decomposition can almost always be found as one of 100 likely modes of the posterior. Thus, the user need only choose one of these modes in a sparse set of patches and the decomposition may then be completed automatically. We demonstrate the performance of our method using synthesized and real reflection images.

2.7LGAug 18, 2016

A Tight Convex Upper Bound on the Likelihood of a Finite Mixture

Elad Mezuman, Yair Weiss

The likelihood function of a finite mixture model is a non-convex function with multiple local maxima and commonly used iterative algorithms such as EM will converge to different solutions depending on initial conditions. In this paper we ask: is it possible to assess how far we are from the global maximum of the likelihood? Since the likelihood of a finite mixture model can grow unboundedly by centering a Gaussian on a single datapoint and shrinking the covariance, we constrain the problem by assuming that the parameters of the individual models are members of a large discrete set (e.g. estimating a mixture of two Gaussians where the means and variances of both Gaussians are members of a set of a million possible means and variances). For this setting we show that a simple upper bound on the likelihood can be computed using convex optimization and we analyze conditions under which the bound is guaranteed to be tight. This bound can then be used to assess the quality of solutions found by EM (where the final result is projected on the discrete set) or any other mixture estimation algorithm. For any dataset our method allows us to find a finite mixture model together with a dataset-specific bound on how far the likelihood of this mixture is from the global optimum of the likelihood

1.1CVApr 11, 2016

Statistics of RGBD Images

Dan Rosenbaum, Yair Weiss

Cameras that can measure the depth of each pixel in addition to its color have become easily available and are used in many consumer products worldwide. Often the depth channel is captured at lower quality compared to the RGB channels and different algorithms have been proposed to improve the quality of the D channel given the RGB channels. Typically these approaches work by assuming that edges in RGB are correlated with edges in D. In this paper we approach this problem from the standpoint of natural image statistics. We obtain examples of high quality RGBD images from a computer graphics generated movie (MPI-Sintel) and we use these examples to compare different probabilistic generative models of RGBD image patches. We then use the generative models together with a degradation model and obtain a Bayes Least Squares (BLS) estimator of the D channel given the RGB channels. Our results show that learned generative models outperform the state-of-the-art in improving the quality of depth channels given the color channels in natural images even when training is performed on artificially generated images.

1.1CVApr 11, 2016

Beyond Brightness Constancy: Learning Noise Models for Optical Flow

Dan Rosenbaum, Yair Weiss

Optical flow is typically estimated by minimizing a "data cost" and an optional regularizer. While there has been much work on different regularizers many modern algorithms still use a data cost that is not very different from the ones used over 30 years ago: a robust version of brightness constancy or gradient constancy. In this paper we leverage the recent availability of ground-truth optical flow databases in order to learn a data cost. Specifically we take a generative approach in which the data cost models the distribution of noise after warping an image according to the flow and we measure the "goodness" of a data cost by how well it matches the true distribution of flow warp error. Consistent with current practice, we find that robust versions of gradient constancy are better models than simple brightness constancy but a learned GMM that models the density of patches of warp error gives a much better fit than any existing assumption of constancy. This significant advantage of the GMM is due to an explicit modeling of the spatial structure of warp errors, a feature which is missing from almost all existing data costs in optical flow. Finally, we show how a good density model of warp error patches can be used for optical flow estimation on whole images. We replace the data cost by the expected patch log-likelihood (EPLL), and show how this cost can be optimized iteratively using an additional step of denoising the warp error image. The results of our experiments are promising and show that patch models with higher likelihood lead to better optical flow estimation.

10.9AISep 26, 2013

Tighter Linear Program Relaxations for High Order Graphical Models

Elad Mezuman, Daniel Tarlow, Amir Globerson et al.

Graphical models with High Order Potentials (HOPs) have received considerable interest in recent years. While there are a variety of approaches to inference in these models, nearly all of them amount to solving a linear program (LP) relaxation with unary consistency constraints between the HOP and the individual variables. In many cases, the resulting relaxations are loose, and in these cases the results of inference can be poor. It is thus desirable to look for more accurate ways of performing inference in these models. In this work, we study the LP relaxations that result from enforcing additional consistency constraints between the HOP and the rest of the model. We address theoretical questions about the strength of the resulting relaxations compared to the relaxations that arise in standard approaches, and we develop practical and efficient message passing algorithms for optimizing the LPs. Empirically, we show that the LPs with additional consistency constraints lead to more accurate inference on some challenging problems that include a combination of low order and high order terms.

47.6AIJan 23, 2013

Loopy Belief Propagation for Approximate Inference: An Empirical Study

Kevin Murphy, Yair Weiss, Michael I. Jordan

Recently, researchers have demonstrated that loopy belief propagation - the use of Pearls polytree algorithm IN a Bayesian network WITH loops OF error- correcting codes.The most dramatic instance OF this IS the near Shannon - limit performance OF Turbo Codes codes whose decoding algorithm IS equivalent TO loopy belief propagation IN a chain - structured Bayesian network. IN this paper we ask : IS there something special about the error - correcting code context, OR does loopy propagation WORK AS an approximate inference schemeIN a more general setting? We compare the marginals computed using loopy propagation TO the exact ones IN four Bayesian network architectures, including two real - world networks : ALARM AND QMR.We find that the loopy beliefs often converge AND WHEN they do, they give a good approximation TO the correct marginals.However,ON the QMR network, the loopy beliefs oscillated AND had no obvious relationship TO the correct posteriors. We present SOME initial investigations INTO the cause OF these oscillations, AND show that SOME simple methods OF preventing them lead TO the wrong results.

19.2AIJan 10, 2013

The Factored Frontier Algorithm for Approximate Inference in DBNs

Kevin Murphy, Yair Weiss

The Factored Frontier (FF) algorithm is a simple approximate inferencealgorithm for Dynamic Bayesian Networks (DBNs). It is very similar tothe fully factorized version of the Boyen-Koller (BK) algorithm, butinstead of doing an exact update at every step followed bymarginalisation (projection), it always works with factoreddistributions. Hence it can be applied to models for which the exactupdate step is intractable. We show that FF is equivalent to (oneiteration of) loopy belief propagation (LBP) on the original DBN, andthat BK is equivalent (to one iteration of) LBP on a DBN where wecluster some of the nodes. We then show empirically that byiterating, LBP can improve on the accuracy of both FF and BK. Wecompare these algorithms on two real-world DBNs: the first is a modelof a water treatment plant, and the second is a coupled HMM, used tomodel freeway traffic.

22.0AIJun 20, 2012

MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies

Yair Weiss, Chen Yanover, Talya Meltzer

Finding the most probable assignment (MAP) in a general graphical model is known to be NP hard but good approximations have been attained with max-product belief propagation (BP) and its variants. In particular, it is known that using BP on a single-cycle graph or tree reweighted BP on an arbitrary graph will give the MAP solution if the beliefs have no ties. In this paper we extend the setting under which BP can be used to provably extract the MAP. We define Convex BP as BP algorithms based on a convex free energy approximation and show that this class includes ordinary BP with single-cycle, tree reweighted BP and many other BP variants. We show that when there are no ties, fixed-points of convex max-product BP will provably give the MAP solution. We also show that convex sum-product BP at sufficiently small temperatures can be used to solve linear programs that arise from relaxing the MAP problem. Finally, we derive a novel condition that allows us to derive the MAP solution even if some of the convex BP beliefs have ties. In experiments, we show that our theorems allow us to find the MAP in many real-world instances of graphical models where exact inference using junction-tree is impossible.

24.2DSJun 13, 2012

Tightening LP Relaxations for MAP using Message Passing

David Sontag, Talya Meltzer, Amir Globerson et al.

Linear Programming (LP) relaxations have become powerful tools for finding the most probable (MAP) configuration in graphical models. These relaxations can be solved efficiently using message-passing algorithms such as belief propagation and, when the relaxation is tight, provably find the MAP configuration. The standard LP relaxation is not tight enough in many real-world problems, however, and this has lead to the use of higher order cluster-based LP relaxations. The computational cost increases exponentially with the size of the clusters and limits the number and type of clusters we can use. We propose to solve the cluster selection problem monotonically in the dual LP, iteratively selecting clusters with guaranteed improvement, and quickly re-solving with the added clusters by reusing the existing solution. Our dual message-passing algorithm finds the MAP configuration in protein sidechain placement, protein design, and stereo problems, in cases where the standard LP relaxation fails.

11.7IRJun 13, 2012

Latent Topic Models for Hypertext

Amit Gruber, Michal Rosen-Zvi, Yair Weiss

Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collection such as the Internet, there has also been great interest in extending these approaches to hypertext [6, 9]. These approaches typically model links in an analogous fashion to how they model words - the document-link co-occurrence matrix is modeled in the same way that the document-word co-occurrence matrix is modeled in standard topic models. In this paper we present a probabilistic generative model for hypertext document collections that explicitly models the generation of links. Specifically, links from a word w to a document d depend directly on how frequent the topic of w is in d, in addition to the in-degree of d. We show how to perform EM learning on this model efficiently. By not modeling links as analogous to words, we end up using far fewer free parameters and obtain better link prediction results.

18.6AIMay 9, 2012

Convergent message passing algorithms - a unifying view

Talya Meltzer, Amir Globerson, Yair Weiss

Message-passing algorithms have emerged as powerful techniques for approximate inference in graphical models. When these algorithms converge, they can be shown to find local (or sometimes even global) optima of variational formulations to the inference problem. But many of the most popular algorithms are not guaranteed to converge. This has lead to recent interest in convergent message-passing algorithms. In this paper, we present a unified view of convergent message-passing algorithms. We present a simple derivation of an abstract algorithm, tree-consistency bound optimization (TCBO) that is provably convergent in both its sum and max product forms. We then show that many of the existing convergent algorithms are instances of our TCBO algorithm, and obtain novel convergent algorithms "for free" by exchanging maximizations and summations in existing algorithms. In particular, we show that Wainwright's non-convergent sum-product algorithm for tree based variational bounds, is actually convergent with the right update order for the case where trees are monotonic chains.