Martin Vetterli

h-index97

19papers

598citations

Novelty51%

AI Score41

Ranked #65,757 of 194,257 authors (top 34%)#22,490 in CV (top 38%)

19 Papers

5.7CVNov 23, 2022Code

Privacy-Enhancing Optical Embeddings for Lensless Classification

Eric Bezzam, Martin Vetterli, Matthieu Simeoni

Lensless imaging can provide visual privacy due to the highly multiplexed characteristic of its measurements. However, this alone is a weak form of security, as various adversarial attacks can be designed to invert the one-to-many scene mapping of such cameras. In this work, we enhance the privacy provided by lensless imaging by (1) downsampling at the sensor and (2) using a programmable mask with variable patterns as our optical encoder. We build a prototype from a low-cost LCD and Raspberry Pi components, for a total cost of around 100 USD. This very low price point allows our system to be deployed and leveraged in a broad range of applications. In our experiments, we first demonstrate the viability and reconfigurability of our system by applying it to various classification tasks: MNIST, CelebA (face attributes), and CIFAR10. By jointly optimizing the mask pattern and a digital classifier in an end-to-end fashion, low-dimensional, privacy-enhancing embeddings are learned directly at the sensor. Secondly, we show how the proposed system, through variable mask patterns, can thwart adversaries that attempt to invert the system (1) via plaintext attacks or (2) in the event of camera parameters leaks. We demonstrate the defense of our system to both risks, with 55% and 26% drops in image quality metrics for attacks based on model-based convex optimization and generative neural networks respectively. We open-source a wave propagation and camera simulator needed for end-to-end optimization, the training software, and a library for interfacing with the camera.

1.2SYApr 20, 2012

Fast and Robust Parametric Estimation of Jointly Sparse Channels

Y. Barbotin, M. Vetterli

We consider the joint estimation of multipath channels obtained with a set of receiving antennas and uniformly probed in the frequency domain. This scenario fits most of the modern outdoor communication protocols for mobile access or digital broadcasting among others. Such channels verify a Sparse Common Support property (SCS) which was used in a previous paper to propose a Finite Rate of Innovation (FRI) based sampling and estimation algorithm. In this contribution we improve the robustness and computational complexity aspects of this algorithm. The method is based on projection in Krylov subspaces to improve complexity and a new criterion called the Partial Effective Rank (PER) to estimate the level of sparsity to gain robustness. If P antennas measure a K-multipath channel with N uniformly sampled measurements per channel, the algorithm possesses an O(KPNlogN) complexity and an O(KPN) memory footprint instead of O(PN^3) and O(PN^2) for the direct implementation, making it suitable for K << N. The sparsity is estimated online based on the PER, and the algorithm therefore has a sense of introspection being able to relinquish sparsity if it is lacking. The estimation performances are tested on field measurements with synthetic AWGN, and the proposed algorithm outperforms non-sparse reconstruction in the medium to low SNR range (< 0dB), increasing the rate of successful symbol decodings by 1/10th in average, and 1/3rd in the best case. The experiments also show that the algorithm does not perform worse than a non-sparse estimation algorithm in non-sparse operating conditions, since it may fall-back to it if the PER criterion does not detect a sufficient level of sparsity. The algorithm is also tested against a method assuming a "discrete" sparsity model as in Compressed Sensing (CS). The conducted test indicates a trade-off between speed and accuracy.

9.5IVJun 3, 2022Code

LenslessPiCam: A Hardware and Software Platform for Lensless Computational Imaging with a Raspberry Pi

Eric Bezzam, Sepand Kashani, Martin Vetterli et al.

Lensless imaging seeks to replace/remove the lens in a conventional imaging system. The earliest cameras were in fact lensless, relying on long exposure times to form images on the other end of a small aperture in a darkened room/container (camera obscura). The introduction of a lens allowed for more light throughput and therefore shorter exposure times, while retaining sharp focus. The incorporation of digital sensors readily enabled the use of computational imaging techniques to post-process and enhance raw images (e.g. via deblurring, inpainting, denoising, sharpening). Recently, imaging scientists have started leveraging computational imaging as an integral part of lensless imaging systems, allowing them to form viewable images from the highly multiplexed raw measurements of lensless cameras (see [5] and references therein for a comprehensive treatment of lensless imaging). This represents a real paradigm shift in camera system design as there is more flexibility to cater the hardware to the application at hand (e.g. lightweight or flat designs). This increased flexibility comes however at the price of a more demanding post-processing of the raw digital recordings and a tighter integration of sensing and computation, often difficult to achieve in practice due to inefficient interactions between the various communities of scientists involved. With LenslessPiCam, we provide an easily accessible hardware and software framework to enable researchers, hobbyists, and students to implement and explore practical and computational aspects of lensless imaging. We also provide detailed guides and exercises so that LenslessPiCam can be used as an educational resource, and point to results from our graduate-level signal processing course.

2.6CVJun 3, 2022

Learning rich optical embeddings for privacy-preserving lensless image classification

Eric Bezzam, Martin Vetterli, Matthieu Simeoni

By replacing the lens with a thin optical element, lensless imaging enables new applications and solutions beyond those supported by traditional camera design and post-processing, e.g. compact and lightweight form factors and visual privacy. The latter arises from the highly multiplexed measurements of lensless cameras, which require knowledge of the imaging system to recover a recognizable image. In this work, we exploit this unique multiplexing property: casting the optics as an encoder that produces learned embeddings directly at the camera sensor. We do so in the context of image classification, where we jointly optimize the encoder's parameters and those of an image classifier in an end-to-end fashion. Our experiments show that jointly learning the lensless optical encoder and the digital processing allows for lower resolution embeddings at the sensor, and hence better privacy as it is much harder to recover meaningful images from these measurements. Additional experiments show that such an optimization allows for lensless measurements that are more robust to typical real-world image transformations. While this work focuses on classification, the proposed programmable lensless camera and end-to-end optimization can be applied to other computational imaging tasks.

2.7IVJun 9, 2022

How Asynchronous Events Encode Video

Karen Adam, Adam Scholefield, Martin Vetterli

As event-based sensing gains in popularity, theoretical understanding is needed to harness this technology's potential. Instead of recording video by capturing frames, event-based cameras have sensors that emit events when their inputs change, thus encoding information in the timing of events. This creates new challenges in establishing reconstruction guarantees and algorithms, but also provides advantages over frame-based video. We use time encoding machines to model event-based sensors: TEMs also encode their inputs by emitting events characterized by their timing and reconstruction from time encodings is well understood. We consider the case of time encoding bandlimited video and demonstrate a dependence between spatial sensor density and overall spatial and temporal resolution. Such a dependence does not occur in frame-based video, where temporal resolution depends solely on the frame rate of the video and spatial resolution depends solely on the pixel grid. However, this dependence arises naturally in event-based video and allows oversampling in space to provide better time resolution. As such, event-based vision encourages using more sensors that emit fewer events over time.

3.6IVSep 25, 2024Code

Let There Be Light: Robust Lensless Imaging Under External Illumination With Deep Learning

Eric Bezzam, Stefan Peters, Martin Vetterli

Lensless cameras relax the design constraints of traditional cameras by shifting image formation from analog optics to digital post-processing. While new camera designs and applications can be enabled, lensless imaging is very sensitive to unwanted interference (other sources, noise, etc.). In this work, we address a prevalent noise source that has not been studied for lensless imaging: external illumination e.g. from ambient and direct lighting. Being robust to a variety of lighting conditions would increase the practicality and adoption of lensless imaging. To this end, we propose multiple recovery approaches that account for external illumination by incorporating its estimate into the image recovery process. At the core is a physics-based reconstruction that combines learnable image recovery and denoisers, all of whose parameters are trained using experimentally gathered data. Compared to standard reconstruction methods, our approach yields significant qualitative and quantitative improvements. We open-source our implementations and a 25K dataset of measurements under multiple lighting conditions.

13.4IVFeb 3, 2025Code

Towards Robust and Generalizable Lensless Imaging with Modular Learned Reconstruction

Eric Bezzam, Yohann Perron, Martin Vetterli

Lensless cameras disregard the conventional design that imaging should mimic the human eye. This is done by replacing the lens with a thin mask, and moving image formation to the digital post-processing. State-of-the-art lensless imaging techniques use learned approaches that combine physical modeling and neural networks. However, these approaches make simplifying modeling assumptions for ease of calibration and computation. Moreover, the generalizability of learned approaches to lensless measurements of new masks has not been studied. To this end, we utilize a modular learned reconstruction in which a key component is a pre-processor prior to image recovery. We theoretically demonstrate the pre-processor's necessity for standard image recovery techniques (Wiener filtering and iterative algorithms), and through extensive experiments show its effectiveness for multiple lensless imaging approaches and across datasets of different mask types (amplitude and phase). We also perform the first generalization benchmark across mask types to evaluate how well reconstructions trained with one system generalize to others. Our modular reconstruction enables us to use pre-trained components and transfer learning on new systems to cut down weeks of tedious measurements and training. As part of our work, we open-source four datasets, and software for measuring datasets and for training our modular reconstruction.

3.6CRSep 19, 2025Code

LenslessMic: Audio Encryption and Authentication via Lensless Computational Imaging

Petr Grinberg, Eric Bezzam, Paolo Prandoni et al.

With society's increasing reliance on digital data sharing, the protection of sensitive information has become critical. Encryption serves as one of the privacy-preserving methods; however, its realization in the audio domain predominantly relies on signal processing or software methods embedded into hardware. In this paper, we introduce LenslessMic, a hybrid optical hardware-based encryption method that utilizes a lensless camera as a physical layer of security applicable to multiple types of audio. We show that LenslessMic enables (1) robust authentication of audio recordings and (2) encryption strength that can rival the search space of 256-bit digital standards, while maintaining high-quality signals and minimal loss of content information. The approach is validated with a low-cost Raspberry Pi prototype and is open-sourced together with datasets to facilitate research in the area.

0.9CVApr 27, 2018Code

Bound and Conquer: Improving Triangulation by Enforcing Consistency

Adam Scholefield, Alireza Ghasemi, Martin Vetterli

We study the accuracy of triangulation in multi-camera systems with respect to the number of cameras. We show that, under certain conditions, the optimal achievable reconstruction error decays quadratically as more cameras are added to the system. Furthermore, we analyse the error decay-rate of major state-of-the-art algorithms with respect to the number of cameras. To this end, we introduce the notion of consistency for triangulation, and show that consistent reconstruction algorithms achieve the optimal quadratic decay, which is asymptotically faster than some other methods. Finally, we present simulations results supporting our findings. Our simulations have been implemented in MATLAB and the resulting code is available in the supplementary material.

1.2CGFeb 19, 2019

Shapes from Echoes: Uniqueness from Point-to-Plane Distance Matrices

Miranda Krekovic, Ivan Dokmanic, Martin Vetterli

We study the problem of localizing a configuration of points and planes from the collection of point-to-plane distances. This problem models simultaneous localization and mapping from acoustic echoes as well as the notable "structure from sound" approach to microphone localization with unknown sources. In our earlier work we proposed computational methods for localization from point-to-plane distances and noted that such localization suffers from various ambiguities beyond the usual rigid body motions; in this paper we provide a complete characterization of uniqueness. We enumerate equivalence classes of configurations which lead to the same distance measurements as a function of the number of planes and points, and algebraically characterize the related transformations in both 2D and 3D. Here we only discuss uniqueness; computational tools and heuristics for practical localization from point-to-plane distances using sound will be addressed in a companion paper.

9.8SDDec 2, 2016

FRIDA: FRI-Based DOA Estimation for Arbitrary Array Layouts

Hanjie Pan, Robin Scheibler, Eric Bezzam et al.

In this paper we present FRIDA---an algorithm for estimating directions of arrival of multiple wideband sound sources. FRIDA combines multi-band information coherently and achieves state-of-the-art resolution at extremely low signal-to-noise ratios. It works for arbitrary array layouts, but unlike the various steered response power and subspace methods, it does not require a grid search. FRIDA leverages recent advances in sampling signals with a finite rate of innovation. It is based on the insight that for any array layout, the entries of the spatial covariance matrix can be linearly transformed into a uniformly sampled sum of sinusoids.

3.9ROSep 18, 2016

Omnidirectional Bats, Point-to-Plane Distances, and the Price of Uniqueness

Miranda Kreković, Ivan Dokmanić, Martin Vetterli

We study simultaneous localization and mapping with a device that uses reflections to measure its distance from walls. Such a device can be realized acoustically with a synchronized collocated source and receiver; it behaves like a bat with no capacity for directional hearing or vocalizing. In this paper we generalize our previous work in 2D, and show that the 3D case is not just a simple extension, but rather a fundamentally different inverse problem. While generically the 2D problem has a unique solution, in 3D uniqueness is always absent in rooms with fewer than nine walls. In addition to the complete characterization of ambiguities which arise due to this non-uniqueness, we propose a robust solution for inexact measurements similar to analogous results for Euclidean Distance Matrices. Our theoretical results have important consequences for the design of collocated range-only SLAM systems, and we support them with an array of computer experiments.

6.7ROAug 31, 2016

Look, no Beacons! Optimal All-in-One EchoSLAM

Miranda Krekovic, Ivan Dokmanic, Martin Vetterli

We study the problem of simultaneously reconstructing a polygonal room and a trajectory of a device equipped with a (nearly) collocated omnidirectional source and receiver. The device measures arrival times of echoes of pulses emitted by the source and picked up by the receiver. No prior knowledge about the device's trajectory is required. Most existing approaches addressing this problem assume multiple sources or receivers, or they assume that some of these are static, serving as beacons. Unlike earlier approaches, we take into account the measurement noise and various constraints on the geometry by formulating the solution as a minimizer of a cost function similar to \emph{stress} in multidimensional scaling. We study uniqueness of the reconstruction from first-order echoes, and we show that in addition to the usual invariance to rigid motions, new ambiguities arise for important classes of rooms and trajectories. We support our theoretical developments with a number of numerical experiments.

3.0CVFeb 24, 2016

On the Accuracy of Point Localisation in a Circular Camera-Array

Alireza Ghasemi, Adam Scholefield, Martin Vetterli

Although many advances have been made in light-field and camera-array image processing, there is still a lack of thorough analysis of the localisation accuracy of different multi-camera systems. By considering the problem from a frame-quantisation perspective, we are able to quantify the point localisation error of circular camera configurations. Specifically, we obtain closed form expressions bounding the localisation error in terms of the parameters describing the acquisition setup. These theoretical results are independent of the localisation algorithm and thus provide fundamental limits on performance. Furthermore, the new frame-quantisation perspective is general enough to be extended to more complex camera configurations.

2.1CVFeb 24, 2016

SHAPE: Linear-Time Camera Pose Estimation With Quadratic Error-Decay

Alireza Ghasemi, Adam Scholefield, Martin Vetterli

We propose a novel camera pose estimation or perspective-n-point (PnP) algorithm, based on the idea of consistency regions and half-space intersections. Our algorithm has linear time-complexity and a squared reconstruction error that decreases at least quadratically, as the number of feature point correspondences increase. Inspired by ideas from triangulation and frame quantisation theory, we define consistent reconstruction and then present SHAPE, our proposed consistent pose estimation algorithm. We compare this algorithm with state-of-the-art pose estimation techniques in terms of accuracy and error decay rate. The experimental results verify our hypothesis on the optimal worst-case quadratic decay and demonstrate its promising performance compared to other approaches.

11.5SDJul 21, 2014

Raking the Cocktail Party

Ivan Dokmanić, Robin Scheibler, Martin Vetterli

We present the concept of an acoustic rake receiver---a microphone beamformer that uses echoes to improve the noise and interference suppression. The rake idea is well-known in wireless communications; it involves constructively combining different multipath components that arrive at the receiver antennas. Unlike spread-spectrum signals used in wireless communications, speech signals are not orthogonal to their shifts. Therefore, we focus on the spatial structure, rather than temporal. Instead of explicitly estimating the channel, we create correspondences between early echoes in time and image sources in space. These multiple sources of the desired and the interfering signal offer additional spatial diversity that we can exploit in the beamformer design. We present several "intuitive" and optimal formulations of acoustic rake receivers, and show theoretically and numerically that the rake formulation of the maximum signal-to-interference-and-noise beamformer offers significant performance boosts in terms of noise and interference suppression. Beyond signal-to-noise ratio, we observe gains in terms of the \emph{perceptual evaluation of speech quality} (PESQ) metric for the speech quality. We accompany the paper by the complete simulation and processing chain written in Python. The code and the sound samples are available online at \url{http://lcav.github.io/AcousticRakeReceiver/}.

2.7MLDec 17, 2013

Recursive Compressed Sensing

Nikolaos M. Freris, Orhan Öçal, Martin Vetterli

We introduce a recursive algorithm for performing compressed sensing on streaming data. The approach consists of a) recursive encoding, where we sample the input stream via overlapping windowing and make use of the previous measurement in obtaining the next one, and b) recursive decoding, where the signal estimate from the previous window is utilized in order to achieve faster convergence in an iterative optimization scheme applied to decode the new one. To remove estimation bias, a two-step estimation procedure is proposed comprising support set detection and signal amplitude estimation. Estimation accuracy is enhanced by a non-linear voting method and averaging estimates over multiple windows. We analyze the computational complexity and estimation error, and show that the normalized error variance asymptotically goes to zero for sublinear sparsity. Our simulation results show speed up of an order of magnitude over traditional CS, while obtaining significantly lower reconstruction error under mild conditions on the signal magnitudes and the noise level.

6.6ITOct 7, 2013

A Fast Hadamard Transform for Signals with Sub-linear Sparsity in the Transform Domain

Robin Scheibler, Saeid Haghighatshoar, Martin Vetterli

A new iterative low complexity algorithm has been presented for computing the Walsh-Hadamard transform (WHT) of an $N$ dimensional signal with a $K$-sparse WHT, where $N$ is a power of two and $K = O(N^α)$, scales sub-linearly in $N$ for some $0 < α< 1$. Assuming a random support model for the non-zero transform domain components, the algorithm reconstructs the WHT of the signal with a sample complexity $O(K \log_2(\frac{N}{K}))$, a computational complexity $O(K\log_2(K)\log_2(\frac{N}{K}))$ and with a very high probability asymptotically tending to 1. The approach is based on the subsampling (aliasing) property of the WHT, where by a carefully designed subsampling of the time domain signal, one can induce a suitable aliasing pattern in the transform domain. By treating the aliasing patterns as parity-check constraints and borrowing ideas from erasure correcting sparse-graph codes, the recovery of the non-zero spectral values has been formulated as a belief propagation (BP) algorithm (peeling decoding) over a sparse-graph code for the binary erasure channel (BEC). Tools from coding theory are used to analyze the asymptotic performance of the algorithm in the very sparse ($α\in(0,\frac{1}{3}]$) and the less sparse ($α\in(\frac{1}{3},1)$) regime.

13.4SIAug 13, 2012

Locating the Source of Diffusion in Large-Scale Networks

Pedro C. Pinto, Patrick Thiran, Martin Vetterli

How can we localize the source of diffusion in a complex network? Due to the tremendous size of many real networks--such as the Internet or the human social graph--it is usually infeasible to observe the state of all nodes in a network. We show that it is fundamentally possible to estimate the location of the source from measurements collected by sparsely-placed observers. We present a strategy that is optimal for arbitrary trees, achieving maximum probability of correct localization. We describe efficient implementations with complexity O(N^α), where α=1 for arbitrary trees, and α=3 for arbitrary graphs. In the context of several case studies, we determine how localization accuracy is affected by various system parameters, including the structure of the network, the density of observers, and the number of observed cascades.