LGMar 19, 2023
A model is worth tens of thousands of examplesThomas Dagès, Laurent D. Cohen, Alfred M. Bruckstein
Traditional signal processing methods relying on mathematical data generation models have been cast aside in favour of deep neural networks, which require vast amounts of data. Since the theoretical sample complexity is nearly impossible to evaluate, these amounts of examples are usually estimated with crude rules of thumb. However, these rules only suggest when the networks should work, but do not relate to the traditional methods. In particular, an interesting question is: how much data is required for neural networks to be on par or outperform, if possible, the traditional model-based methods? In this work, we empirically investigate this question in two simple examples, where the data is generated according to precisely defined mathematical models, and where well-understood optimal or state-of-the-art mathematical data-agnostic solutions are known. A first problem is deconvolving one-dimensional Gaussian signals and a second one is estimating a circle's radius and location in random grayscale images of disks. By training various networks, either naive custom designed or well-established ones, with various amounts of training data, we find that networks require tens of thousands of examples in comparison to the traditional methods, whether the networks are trained from scratch or even with transfer-learning or finetuning.
LGMar 12, 2023
From Compass and Ruler to Convolution and Nonlinearity: On the Surprising Difficulty of Understanding a Simple CNN Solving a Simple Geometric Estimation TaskThomas Dagès, Michael Lindenbaum, Alfred M. Bruckstein
Neural networks are omnipresent, but remain poorly understood. Their increasing complexity and use in critical systems raises the important challenge to full interpretability. We propose to address a simple well-posed learning problem: estimating the radius of a centred pulse in a one-dimensional signal or of a centred disk in two-dimensional images using a simple convolutional neural network. Surprisingly, understanding what trained networks have learned is difficult and, to some extent, counter-intuitive. However, an in-depth theoretical analysis in the one-dimensional case allows us to comprehend constraints due to the chosen architecture, the role of each filter and of the nonlinear activation function, and every single value taken by the weights of the model. Two fundamental concepts of neural networks arise: the importance of invariance and of the shape of the nonlinear activation functions.
CVMar 23, 2025
Finsler Multi-Dimensional Scaling: Manifold Learning for Asymmetric Dimensionality Reduction and EmbeddingThomas Dagès, Simon Weber, Ya-Wei Eileen Lin et al.
Dimensionality reduction is a fundamental task that aims to simplify complex data by reducing its feature dimensionality while preserving essential patterns, with core applications in data analysis and visualisation. To preserve the underlying data structure, multi-dimensional scaling (MDS) methods focus on preserving pairwise dissimilarities, such as distances. They optimise the embedding to have pairwise distances as close as possible to the data dissimilarities. However, the current standard is limited to embedding data in Riemannian manifolds. Motivated by the lack of asymmetry in the Riemannian metric of the embedding space, this paper extends the MDS problem to a natural asymmetric generalisation of Riemannian manifolds called Finsler manifolds. Inspired by Euclidean space, we define a canonical Finsler space for embedding asymmetric data. Due to its simplicity with respect to geodesics, data representation in this space is both intuitive and simple to analyse. We demonstrate that our generalisation benefits from the same theoretical convergence guarantees. We reveal the effectiveness of our Finsler embedding across various types of non-symmetric data, highlighting its value in applications such as data visualisation, dimensionality reduction, directed graph embedding, and link prediction.
LGSep 2, 2025
VariAntNet: Learning Decentralized Control of Multi-Agent SystemsYigal Koifman, Erez Koifman, Eran Iceland et al.
A simple multi-agent system can be effectively utilized in disaster response applications, such as firefighting. Such a swarm is required to operate in complex environments with limited local sensing and no reliable inter-agent communication or centralized control. These simple robotic agents, also known as Ant Robots, are defined as anonymous agents that possess limited sensing capabilities, lack a shared coordinate system, and do not communicate explicitly with one another. A key challenge for simple swarms lies in maintaining cohesion and avoiding fragmentation despite limited-range sensing. Recent advances in machine learning offer effective solutions to some of the classical decentralized control challenges. We propose VariAntNet, a deep learning-based decentralized control model designed to facilitate agent swarming and collaborative task execution. VariAntNet includes geometric features extraction from unordered, variable-sized local observations. It incorporates a neural network architecture trained with a novel, differentiable, multi-objective, mathematically justified loss function that promotes swarm cohesiveness by utilizing the properties of the visibility graph Laplacian matrix. VariAntNet is demonstrated on the fundamental multi-agent gathering task, where agents with bearing-only and limited-range sensing must gather at some location. VariAntNet significantly outperforms an existing analytical solution, achieving more than double the convergence rate while maintaining high swarm connectivity across varying swarm sizes. While the analytical solution guarantees cohesion, it is often too slow in practice. In time-critical scenarios, such as emergency response operations where lives are at risk, slower analytical methods are impractical and justify the loss of some agents within the swarm. This paper presents and analyzes this trade-off in detail.
CVJun 8, 2024
Metric Convolutions: A Unifying Theory to Adaptive Image ConvolutionsThomas Dagès, Michael Lindenbaum, Alfred M. Bruckstein
Standard convolutions are prevalent in image processing and deep learning, but their fixed kernels limits adaptability. Several deformation strategies of the reference kernel grid have been proposed. Yet, they lack a unified theoretical framework. By returning to a metric perspective for images, now seen as two-dimensional manifolds equipped with notions of local and geodesic distances, either symmetric (Riemannian) or not (Finsler), we provide a unifying principle: the kernel positions are samples of unit balls of implicit metrics. With this new perspective, we also propose metric convolutions, a novel approach that samples unit balls from explicit signal-dependent metrics, providing interpretable operators with geometric regularisation. This framework, compatible with gradient-based optimisation, can directly replace existing convolutions applied to either input images or deep features of neural networks. Metric convolutions typically require fewer parameters and provide better generalisation. Our approach shows competitive performance in standard denoising and classification tasks.
RONov 1, 2020
Broadcast Guidance of Agents in Deviated Linear Cyclic PursuitIlana Segall, Alfred M. Bruckstein
In this report we show the emergent behavior of a group of agents, ordered from 1 to n, performing deviated, linear, cyclic pursuit, in the presence of a broadcast guidance control. Each agent senses the relative position of its target, i.e. agent i senses the relative position of agent i+1. The broadcast control, a velocity signal, is detected by a random set of agents in the group. We assume the agents to be modeled as single integrators. We show that the emergent behavior of the group is determined by the deviation angle and by the set of agents detecting the guidance control.
CVJul 29, 2019
Seeing Things in Random-Dot VideosThomas Dagès, Michael Lindenbaum, Alfred M. Bruckstein
Humans possess an intricate and powerful visual system in order to perceive and understand the environing world. Human perception can effortlessly detect and correctly group features in visual data and can even interpret random-dot videos induced by imaging natural dynamic scenes with highly noisy sensors such as ultrasound imaging. Remarkably, this happens even if perception completely fails when the same information is presented frame by frame rather than in a video sequence. We study this property of surprising dynamic perception with the first goal of proposing a new detection and spatio-temporal grouping algorithm for such signals when, per frame, the information on objects is both random and sparse and embedded in random noise. The algorithm is based on the succession of temporal integration and spatial statistical tests of unlikeliness, the a contrario framework. The algorithm not only manages to handle such signals but the striking similarity in its performance to the perception by human observers, as witnessed by a series of psychophysical experiments on image and video data, leads us to see in it a simple computational Gestalt model of human perception with only two parameters: the time integration and the visual angle for candidate shapes to be detected.
MMJan 30, 2019
Benefiting from Duplicates of Compressed Data: Shift-Based Holographic Compression of ImagesYehuda Dar, Alfred M. Bruckstein
Storage systems often rely on multiple copies of the same compressed data, enabling recovery in case of binary data errors, of course, at the expense of a higher storage cost. In this paper we show that a wiser method of duplication entails great potential benefits for data types tolerating approximate representations, like images and videos. We propose a method to produce a set of distinct compressed representations for a given signal, such that any subset of them allows reconstruction of the signal at a quality depending only on the number of compressed representations utilized. Essentially, we implement the holographic representation idea, where all the representations are equally important in refining the reconstruction. Here we propose to exploit the shift sensitivity of common compression processes and generate holographic representations via compression of various shifts of the signal. Two implementations for the idea, based on standard compression methods, are presented: the first is a simple, optimization-free design. The second approach originates in a challenging rate-distortion optimization, mitigated by the alternating direction method of multipliers (ADMM), leading to a process of repeatedly applying standard compression techniques. Evaluation of the approach, in conjunction with the JPEG2000 image compression standard, shows the effectiveness of the optimization in providing compressed holographic representations that, by means of an elementary reconstruction process, enable impressive gains of several dBs in PSNR over exact duplications.
MMFeb 12, 2018
Compression for Multiple ReconstructionsYehuda Dar, Michael Elad, Alfred M. Bruckstein
In this work we propose a method for optimizing the lossy compression for a network of diverse reconstruction systems. We focus on adapting a standard image compression method to a set of candidate displays, presenting the decompressed signals to viewers. Each display is modeled as a linear operator applied after decompression, and its probability to serve a network user. We formulate a complicated operational rate-distortion optimization trading-off the network's expected mean-squared reconstruction error and the compression bit-cost. Using the alternating direction method of multipliers (ADMM) we develop an iterative procedure where the network structure is separated from the compression method, enabling the reliance on standard compression techniques. We present experimental results showing our method to be the best approach for adjusting high bit-rate image compression (using the state-of-the-art HEVC standard) to a set of displays modeled as blur degradations.
MMNov 21, 2017
Optimized Pre-Compensating CompressionYehuda Dar, Michael Elad, Alfred M. Bruckstein
In imaging systems, following acquisition, an image/video is transmitted or stored and eventually presented to human observers using different and often imperfect display devices. While the resulting quality of the output image may severely be affected by the display, this degradation is usually ignored in the preceding compression. In this paper we model the sub-optimality of the display device as a known degradation operator applied on the decompressed image/video. We assume the use of a standard compression path, and augment it with a suitable pre-processing procedure, providing a compressed signal intended to compensate the degradation without any post-filtering. Our approach originates from an intricate rate-distortion problem, optimizing the modifications to the input image/video for reaching best end-to-end performance. We address this seemingly computationally intractable problem using the alternating direction method of multipliers (ADMM) approach, leading to a procedure in which a standard compression technique is iteratively applied. We demonstrate the proposed method for adjusting HEVC image/video compression to compensate post-decompression visual effects due to a common type of displays. Particularly, we use our method to reduce motion-blur perceived while viewing video on LCD devices. The experiments establish our method as a leading approach for preprocessing high bit-rate compression to counterbalance a post-decompression degradation.
DMOct 23, 2017
Probabilistic Pursuits on GraphsMichael Amir, Alfred M. Bruckstein
We consider discrete dynamical systems of "ant-like" agents engaged in a sequence of pursuits on a graph environment. The agents emerge one by one at equal time intervals from a source vertex $s$ and pursue each other by greedily attempting to close the distance to their immediate predecessor, the agent that emerged just before them from $s$, until they arrive at the destination point $t$. Such pursuits have been investigated before in the continuous setting and in discrete time when the underlying environment is a regular grid. In both these settings the agents' walks provably converge to a shortest path from $s$ to $t$. Furthermore, assuming a certain natural probability distribution over the move choices of the agents on the grid (in case there are multiple shortest paths between an agent and its predecessor), the walks converge to the uniform distribution over all shortest paths from $s$ to $t$. We study the evolution of agent walks over a general finite graph environment $G$. Our model is a natural generalization of the pursuit rule proposed for the case of the grid. The main results are as follows. We show that "convergence" to the shortest paths in the sense of previous work extends to all pseudo-modular graphs (i.e. graphs in which every three pairwise intersecting disks have a nonempty intersection), and also to environments obtained by taking graph products, generalizing previous results in two different ways. We show that convergence to the shortest paths is also obtained by chordal graphs, and discuss some further positive and negative results for planar graphs. In the most general case, convergence to the shortest paths is not guaranteed, and the agents may get stuck on sets of recurrent, non-optimal walks from $s$ to $t$. However, we show that the limiting distributions of the agents' walks will always be uniform distributions over some set of walks of equal length.
CVNov 28, 2015
Real-Time Depth Refinement for Specular ObjectsRoy Or - El, Rom Hershkovitz, Aaron Wetzler et al.
The introduction of consumer RGB-D scanners set off a major boost in 3D computer vision research. Yet, the precision of existing depth scanners is not accurate enough to recover fine details of a scanned object. While modern shading based depth refinement methods have been proven to work well with Lambertian objects, they break down in the presence of specularities. We present a novel shape from shading framework that addresses this issue and enhances both diffuse and specular objects' depth profiles. We take advantage of the built-in monochromatic IR projector and IR images of the RGB-D scanners and present a lighting model that accounts for the specular regions in the input image. Using this model, we reconstruct the depth map in real-time. Both quantitative tests and visual evaluations prove that the proposed method produces state of the art depth reconstruction results.
CVOct 30, 2015
Postprocessing of Compressed Images via Sequential DenoisingYehuda Dar, Alfred M. Bruckstein, Michael Elad et al.
In this work we propose a novel postprocessing technique for compression-artifact reduction. Our approach is based on posing this task as an inverse problem, with a regularization that leverages on existing state-of-the-art image denoising algorithms. We rely on the recently proposed Plug-and-Play Prior framework, suggesting the solution of general inverse problems via Alternating Direction Method of Multipliers (ADMM), leading to a sequence of Gaussian denoising steps. A key feature in our scheme is a linearization of the compression-decompression process, so as to get a formulation that can be optimized. In addition, we supply a thorough analysis of this linear approximation for several basic compression procedures. The proposed method is suitable for diverse compression techniques that rely on transform coding. Specifically, we demonstrate impressive gains in image quality for several leading compression methods - JPEG, JPEG2000, and HEVC.
CVJun 16, 2015
Depth Perception in Autostereograms: 1/f-Noise is BestYael Yankelevsky, Ishai Shvartz, Tamar Avraham et al.
An autostereogram is a single image that encodes depth information that pops out when looking at it. The trick is achieved by replicating a vertical strip that sets a basic two-dimensional pattern with disparity shifts that encode a three-dimensional scene. It is of interest to explore the dependency between the ease of perceiving depth in autostereograms and the choice of the basic pattern used for generating them. In this work we confirm a theory proposed by Bruckstein et al. to explain the process of autostereographic depth perception, providing a measure for the ease of "locking into" the depth profile, based on the spectral properties of the basic pattern used. We report the results of three sets of psychophysical experiments using autostereograms generated from two-dimensional random noise patterns having power spectra of the form $1/f^β$. The experiments were designed to test the ability of human subjects to identify smooth, low resolution surfaces, as well as detail, in the form of higher resolution objects in the depth profile, and to determine limits in identifying small objects as a function of their size. In accordance with the theory, we discover a significant advantage of the $1/f$ noise pattern (pink noise) for fast depth lock-in and fine detail detection, showing that such patterns are optimal choices for autostereogram design. Validating the theoretical model predictions strengthens its underlying assumptions, and contributes to a better understanding of the visual system's binocular disparity mechanisms.
CVMay 20, 2014
Sparsity Based Methods for Overparameterized Variational ProblemsRaja Giryes, Michael Elad, Alfred M. Bruckstein
Two complementary approaches have been extensively used in signal and image processing leading to novel results, the sparse representation methodology and the variational strategy. Recently, a new sparsity based model has been proposed, the cosparse analysis framework, which may potentially help in bridging sparse approximation based methods to the traditional total-variation minimization. Based on this, we introduce a sparsity based framework for solving overparameterized variational problems. The latter has been used to improve the estimation of optical flow and also for general denoising of signals and images. However, the recovery of the space varying parameters involved was not adequately addressed by traditional variational methods. We first demonstrate the efficiency of the new framework for one dimensional signals in recovering a piecewise linear and polynomial function. Then, we illustrate how the new technique can be used for denoising and segmentation of images.
MMApr 15, 2014
Improving Low Bit-Rate Video Coding using Spatio-Temporal Down-ScalingYehuda Dar, Alfred M. Bruckstein
Good quality video coding for low bit-rate applications is important for transmission over narrow-bandwidth channels and for storage with limited memory capacity. In this work, we develop a previous analysis for image compression at low bit-rates to adapt it to video signals. Improving compression using down-scaling in the spatial and temporal dimensions is examined. We show, both theoretically and experimentally, that at low bit-rates, we benefit from applying spatio-temporal scaling. The proposed method includes down-scaling before the compression and a corresponding up-scaling afterwards, while the codec itself is left unmodified. We propose analytic models for low bit-rate compression and spatio-temporal scaling operations. Specifically, we use theoretic models of motion-compensated prediction of available and absent frames as in coding and frame-rate up-conversion (FRUC) applications, respectively. The proposed models are designed for multi-resolution analysis. In addition, we formulate a bit-allocation procedure and propose a method for estimating good down-scaling factors of a given video based on its second-order statistics and the given bit-budget. We validate our model with experimental results of H.264 compression.
MMApr 12, 2014
Motion-Compensated Coding and Frame-Rate Up-Conversion: Models and AnalysisYehuda Dar, Alfred M. Bruckstein
Block-based motion estimation (ME) and compensation (MC) techniques are widely used in modern video processing algorithms and compression systems. The great variety of video applications and devices results in numerous compression specifications. Specifically, there is a diversity of frame-rates and bit-rates. In this paper, we study the effect of frame-rate and compression bit-rate on block-based ME and MC as commonly utilized in inter-frame coding and frame-rate up conversion (FRUC). This joint examination yields a comprehensive foundation for comparing MC procedures in coding and FRUC. First, the video signal is modeled as a noisy translational motion of an image. Then, we theoretically model the motion-compensated prediction of an available and absent frames as in coding and FRUC applications, respectively. The theoretic MC-prediction error is further analyzed and its autocorrelation function is calculated for coding and FRUC applications. We show a linear relation between the variance of the MC-prediction error and temporal-distance. While the affecting distance in MC-coding is between the predicted and reference frames, MC-FRUC is affected by the distance between the available frames used for the interpolation. Moreover, the dependency in temporal-distance implies an inverse effect of the frame-rate. FRUC performance analysis considers the prediction error variance, since it equals to the mean-squared-error of the interpolation. However, MC-coding analysis requires the entire autocorrelation function of the error; hence, analytic simplicity is beneficial. Therefore, we propose two constructions of a separable autocorrelation function for prediction error in MC-coding. We conclude by comparing our estimations with experimental results.