CVJul 11, 2022Code
Physical Passive Patch Adversarial Attacks on Visual Odometry SystemsYaniv Nemcovsky, Matan Jacoby, Alex M. Bronstein et al.
Deep neural networks are known to be susceptible to adversarial perturbations -- small perturbations that alter the output of the network and exist under strict norm limitations. While such perturbations are usually discussed as tailored to a specific input, a universal perturbation can be constructed to alter the model's output on a set of inputs. Universal perturbations present a more realistic case of adversarial attacks, as awareness of the model's exact input is not required. In addition, the universal attack setting raises the subject of generalization to unseen data, where given a set of inputs, the universal perturbations aim to alter the model's output on out-of-sample data. In this work, we study physical passive patch adversarial attacks on visual odometry-based autonomous navigation systems. A visual odometry system aims to infer the relative camera motion between two corresponding viewpoints, and is frequently used by vision-based autonomous navigation systems to estimate their state. For such navigation systems, a patch adversarial perturbation poses a severe security issue, as it can be used to mislead a system onto some collision course. To the best of our knowledge, we show for the first time that the error margin of a visual odometry model can be significantly increased by deploying patch adversarial attacks in the scene. We provide evaluation on synthetic closed-loop drone navigation data and demonstrate that a comparable vulnerability exists in real data. A reference implementation of the proposed method and the reported experiments is provided at https://github.com/patchadversarialattacks/patchadversarialattacks.
CVMar 27, 2023
Class-Conditioned Transformation for Enhanced Robust Image ClassificationTsachi Blau, Roy Ganz, Chaim Baskin et al.
Robust classification methods predominantly concentrate on algorithms that address a specific threat model, resulting in ineffective defenses against other threat models. Real-world applications are exposed to this vulnerability, as malicious attackers might exploit alternative threat models. In this work, we propose a novel test-time threat model agnostic algorithm that enhances Adversarial-Trained (AT) models. Our method operates through COnditional image transformation and DIstance-based Prediction (CODIP) and includes two main steps: First, we transform the input image into each dataset class, where the input image might be either clean or attacked. Next, we make a prediction based on the shortest transformed distance. The conditional transformation utilizes the perceptually aligned gradients property possessed by AT models and, as a result, eliminates the need for additional models or additional training. Moreover, it allows users to choose the desired balance between clean and robust accuracy without training. The proposed method achieves state-of-the-art results demonstrated through extensive experiments on various models, AT methods, datasets, and attack types. Notably, applying CODIP leads to substantial robust accuracy improvement of up to $+23\%$, $+20\%$, $+26\%$, and $+22\%$ on CIFAR10, CIFAR100, ImageNet and Flowers datasets, respectively.
COMay 30, 2022
Fast Nonlinear Vector Quantile RegressionAviv A. Rosenberg, Sanketh Vedula, Yaniv Romano et al.
Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable $\mathrm{Y}$ given explanatory features $\boldsymbol{\mathrm{X}}$. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. Recently, vector quantile regression (VQR) was proposed as an extension of QR for vector-valued target variables, thanks to a meaningful generalization of the notion of quantiles to multivariate distributions via optimal transport. Despite its elegance, VQR is arguably not applicable in practice due to several limitations: (i) it assumes a linear model for the quantiles of the target $\boldsymbol{\mathrm{Y}}$ given the features $\boldsymbol{\mathrm{X}}$; (ii) its exact formulation is intractable even for modestly-sized problems in terms of target dimensions, number of regressed quantile levels, or number of features, and its relaxed dual formulation may violate the monotonicity of the estimated quantiles; (iii) no fast or scalable solvers for VQR currently exist. In this work we fully address these limitations, namely: (i) We extend VQR to the non-linear case, showing substantial improvement over linear VQR; (ii) We propose {vector monotone rearrangement}, a method which ensures the quantile functions estimated by VQR are monotone functions; (iii) We provide fast, GPU-accelerated solvers for linear and nonlinear VQR which maintain a fixed memory footprint, and demonstrate that they scale to millions of samples and thousands of quantile levels; (iv) We release an optimized python package of our solvers as to widespread the use of VQR in real-world applications.
LGAug 17, 2022
Random Search Hyper-Parameter Tuning: Expected Improvement Estimation and the Corresponding Lower BoundDan Navon, Alex M. Bronstein
Hyperparameter tuning is a common technique for improving the performance of neural networks. Most techniques for hyperparameter search involve an iterated process where the model is retrained at every iteration. However, the expected accuracy improvement from every additional search iteration, is still unknown. Calculating the expected improvement can help create stopping rules for hyperparameter tuning and allow for a wiser allocation of a project's computational budget. In this paper, we establish an empirical estimate for the expected accuracy improvement from an additional iteration of hyperparameter search. Our results hold for any hyperparameter tuning method which is based on random search \cite{bergstra2012random} and samples hyperparameters from a fixed distribution. We bound our estimate with an error of $O\left(\sqrt{\frac{\log k}{k}}\right)$ w.h.p. where $k$ is the current number of iterations. To the best of our knowledge this is the first bound on the expected gain from an additional iteration of hyperparameter search. Finally, we demonstrate that the optimal estimate for the expected accuracy will still have an error of $\frac{1}{k}$.
NIOct 4, 2022
Using Deep Reinforcement Learning for mmWave Real-Time SchedulingBarak Gahtan, Reuven Cohen, Alex M. Bronstein et al.
We study the problem of real-time scheduling in a multi-hop millimeter-wave (mmWave) mesh. We develop a model-free deep reinforcement learning algorithm called Adaptive Activator RL (AARL), which determines the subset of mmWave links that should be activated during each time slot and the power level for each link. The most important property of AARL is its ability to make scheduling decisions within the strict time slot constraints of typical 5G mmWave networks. AARL can handle a variety of network topologies, network loads, and interference models, it can also adapt to different workloads. We demonstrate the operation of AARL on several topologies: a small topology with 10 links, a moderately-sized mesh with 48 links, and a large topology with 96 links. We show that for each topology, we compare the throughput obtained by AARL to that of a benchmark algorithm called RPMA (Residual Profit Maximizer Algorithm). The most important advantage of AARL compared to RPMA is that it is much faster and can make the necessary scheduling decisions very rapidly during every time slot, while RPMA cannot. In addition, the quality of the scheduling decisions made by AARL outperforms those made by RPMA.
MEJul 3, 2023
Vector Quantile Regression on ManifoldsMarco Pegoraro, Sanketh Vedula, Aviv A. Rosenberg et al.
Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds remains underexplored, even though many important applications inherently involve data distributed on, e.g., spheres (climate and geological phenomena), and tori (dihedral angles in proteins). By leveraging optimal transport theory and c-concave functions, we meaningfully define conditional vector quantile functions of high-dimensional variables on manifolds (M-CVQFs). Our approach allows for quantile estimation, regression, and computation of conditional confidence sets and likelihoods. We demonstrate the approach's efficacy and provide insights regarding the meaning of non-Euclidean quantiles through synthetic and real data experiments.
QUANT-PHApr 13, 2023
Designing Nonlinear Photonic Crystals for High-Dimensional Quantum State EngineeringEyal Rozenberg, Aviv Karnieli, Ofir Yesharim et al.
We propose a novel, physically-constrained and differentiable approach for the generation of D-dimensional qudit states via spontaneous parametric down-conversion (SPDC) in quantum optics. We circumvent any limitations imposed by the inherently stochastic nature of the physical process and incorporate a set of stochastic dynamical equations governing its evolution under the SPDC Hamiltonian. We demonstrate the effectiveness of our model through the design of structured nonlinear photonic crystals (NLPCs) and shaped pump beams; and show, theoretically and experimentally, how to generate maximally entangled states in the spatial degree of freedom. The learning of NLPC structures offers a promising new avenue for shaping and controlling arbitrary quantum states and enables all-optical coherent control of the generated states. We believe that this approach can readily be extended from bulky crystals to thin Metasurfaces and potentially applied to other quantum systems sharing a similar Hamiltonian structures, such as superfluids and superconductors.
CLAug 17, 2022
Transformer Vs. MLP-Mixer: Exponential Expressive Gap For NLP ProblemsDan Navon, Alex M. Bronstein
Vision-Transformers are widely used in various vision tasks. Meanwhile, there is another line of works starting with the MLP-mixer trying to achieve similar performance using mlp-based architectures. Interestingly, until now those mlp-based architectures have not been adapted for NLP tasks. Additionally, until now, mlp-based architectures have failed to achieve state-of-the-art performance in vision tasks. In this paper, we analyze the expressive power of mlp-based architectures in modeling dependencies between multiple different inputs simultaneously, and show an exponential gap between the attention and the mlp-based mechanisms. Our results suggest a theoretical explanation for the mlp inability to compete with attention-based mechanisms in NLP problems, they also suggest that the performance gap in vision tasks may be due to the mlp relative weakness in modeling dependencies between multiple different locations, and that combining smart input permutations with mlp architectures may not be enough to close the performance gap alone.
CVMar 25, 2021Code
Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy LabelsEvgenii Zheltonozhskii, Chaim Baskin, Avi Mendelson et al.
The success of learning with noisy labels (LNL) methods relies heavily on the success of a warm-up stage where standard supervised training is performed using the full (noisy) training set. In this paper, we identify a "warm-up obstacle": the inability of standard warm-up stages to train high quality feature extractors and avert memorization of noisy labels. We propose "Contrast to Divide" (C2D), a simple framework that solves this problem by pre-training the feature extractor in a self-supervised fashion. Using self-supervised pre-training boosts the performance of existing LNL approaches by drastically reducing the warm-up stage's susceptibility to noise level, shortening its duration, and improving extracted feature quality. C2D works out of the box with existing methods and demonstrates markedly improved performance, especially in the high noise regime, where we get a boost of more than 27% for CIFAR-100 with 90% noise over the previous state of the art. In real-life noise settings, C2D trained on mini-WebVision outperforms previous works both in WebVision and ImageNet validation sets by 3% top-1 accuracy. We perform an in-depth analysis of the framework, including investigating the performance of different pre-training approaches and estimating the effective upper bound of the LNL performance with semi-supervised learning. Code for reproducing our experiments is available at https://github.com/ContrastToDivide/C2D
CVAug 24, 2020Code
Self-Supervised Learning for Large-Scale Unsupervised Image ClusteringEvgenii Zheltonozhskii, Chaim Baskin, Alex M. Bronstein et al.
Unsupervised learning has always been appealing to machine learning researchers and practitioners, allowing them to avoid an expensive and complicated process of labeling the data. However, unsupervised learning of complex data is challenging, and even the best approaches show much weaker performance than their supervised counterparts. Self-supervised deep learning has become a strong instrument for representation learning in computer vision. However, those methods have not been evaluated in a fully unsupervised setting. In this paper, we propose a simple scheme for unsupervised classification based on self-supervised representations. We evaluate the proposed approach with several recent self-supervised methods showing that it achieves competitive results for ImageNet classification (39% accuracy on ImageNet with 1000 clusters and 46% with overclustering). We suggest adding the unsupervised evaluation to a set of standard benchmarks for self-supervised learning. The code is available at https://github.com/Randl/kmeans_selfsuper
LGNov 17, 2019Code
Smoothed Inference for Adversarially-Trained ModelsYaniv Nemcovsky, Evgenii Zheltonozhskii, Chaim Baskin et al.
Deep neural networks are known to be vulnerable to adversarial attacks. Current methods of defense from such attacks are based on either implicit or explicit regularization, e.g., adversarial training. Randomized smoothing, the averaging of the classifier outputs over a random distribution centered in the sample, has been shown to guarantee the performance of a classifier subject to bounded perturbations of the input. In this work, we study the application of randomized smoothing as a way to improve performance on unperturbed data as well as to increase robustness to adversarial attacks. The proposed technique can be applied on top of any existing adversarial defense, but works particularly well with the randomized approaches. We examine its performance on common white-box (PGD) and black-box (transfer and NAttack) attacks on CIFAR-10 and CIFAR-100, substantially outperforming previous art for most scenarios and comparable on others. For example, we achieve 60.4% accuracy under a PGD attack on CIFAR-10 using ResNet-20, outperforming previous art by 11.7%. Since our method is based on sampling, it lends itself well for trading-off between the model inference complexity and its performance. A reference implementation of the proposed techniques is provided at https://github.com/yanemcovsky/SIAM
LGNov 17, 2019Code
Loss Aware Post-training QuantizationYury Nahshan, Brian Chmiel, Chaim Baskin et al.
Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. Additionally, we show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation is available at https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
CVSep 25, 2019Code
CAT: Compression-Aware Training for bandwidth reductionChaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii et al.
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression method can result in substantial performance gains. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model in a way that allows better compression of feature maps during inference. Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization. For example, on ResNet-34 we achieve 73.1% accuracy (0.2% degradation from the baseline) with an average representation of only 1.79 bits per value. Reference implementation accompanies the paper at https://github.com/CAT-teams/CAT
CVMay 26, 2019Code
Feature Map Transform Coding for Energy-Efficient CNN InferenceBrian Chmiel, Chaim Baskin, Ron Banner et al.
Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their high computational complexity and memory bandwidth requirements. The latter often dominates the energy footprint on modern hardware. In this paper, we introduce a lossy transform coding approach, inspired by image and video compression, designed to reduce the memory bandwidth due to the storage of intermediate activation calculation results. Our method does not require fine-tuning the network weights and halves the data transfer volumes to the main memory by compressing feature maps, which are highly correlated, with variable length coding. Our method outperform previous approach in term of the number of bits per value with minor accuracy degradation on ResNet-34 and MobileNetV2. We analyze the performance of our approach on a variety of CNN architectures and demonstrate that FPGA implementation of ResNet-18 with our approach results in a reduction of around 40% in the memory energy footprint, compared to quantized network, with negligible impact on accuracy. When allowing accuracy degradation of up to 2%, the reduction of 60% is achieved. A reference implementation is available at https://github.com/CompressTeam/TransformCodingInference
CVApr 22, 2019Code
Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural NetworksYochai Zur, Chaim Baskin, Evgenii Zheltonozhskii et al.
Recently, deep learning has become a de facto standard in machine learning with convolutional neural networks (CNNs) demonstrating spectacular success on a wide variety of tasks. However, CNNs are typically very demanding computationally at inference time. One of the ways to alleviate this burden on certain hardware platforms is quantization relying on the use of low-precision arithmetic representation for the weights and the activations. Another popular method is the pruning of the number of filters in each layer. While mainstream deep learning methods train the neural networks weights while keeping the network architecture fixed, the emerging neural architecture search (NAS) techniques make the latter also amenable to training. In this paper, we formulate optimal arithmetic bit length allocation and neural network pruning as a NAS problem, searching for the configurations satisfying a computational complexity budget while maximizing the accuracy. We use a differentiable search method based on the continuous relaxation of the search space proposed by Liu et al. (arXiv:1806.09055). We show, by grid search, that heterogeneous quantized networks suffer from a high variance which renders the benefit of the search questionable. For pruning, improvement over homogeneous cases is possible, but it is still challenging to find those configurations with the proposed method. The code is publicly available at https://github.com/yochaiz/Slimmable and https://github.com/yochaiz/darts-UNIQ
NISep 30, 2024
Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic AnalysisBarak Gahtan, Robert J. Shahla, Alex M. Bronstein et al.
The increasing adoption of the QUIC transport protocol has transformed encrypted web traffic, necessitating new methodologies for network analysis. However, existing datasets lack the scope, metadata, and decryption capabilities required for robust benchmarking in encrypted traffic research. We introduce VisQUIC, a large-scale dataset of 100,000 labeled QUIC traces from over 44,000 websites, collected over four months. Unlike prior datasets, VisQUIC provides SSL keys for controlled decryption, supports multiple QUIC implementations (Chromium QUIC, Facebooks mvfst, Cloudflares quiche), and introduces a novel image-based representation that enables machine learning-driven encrypted traffic analysis. The dataset includes standardized benchmarking tools, ensuring reproducibility. To demonstrate VisQUICs utility, we present a benchmarking task for estimating HTTP/3 responses in encrypted QUIC traffic, achieving 97% accuracy using only observable packet features. By publicly releasing VisQUIC, we provide an open foundation for advancing encrypted traffic analysis, QUIC security research, and network monitoring.
35.3LGApr 2
Coupled Query-Key Dynamics for AttentionBarak Gahtan, Alex M. Bronstein
Standard scaled dot-product attention computes scores from static, independent projections of the input. We show that evolving queries and keys \emph{jointly} through shared learned dynamics before scoring - which we call \textbf{coupled QK dynamics} - improves language modeling perplexity and training stability. On WikiText-103 at 60M parameters, coupled dynamics achieves 22.55--22.62 perplexity vs.\ 24.22 for standard attention ($-$6.6--6.9\%), with only 0.11\% additional parameters (shared across both instantiations). A structural ablation isolates coupling as the active ingredient: a symplectic (Hamiltonian) and a non-symplectic (Euler) integrator perform identically when both couple Q and K, while an uncoupled MLP baseline of matched capacity reaches only 23.81 with 8$\times$ higher seed variance. The integration step count (1--7) is similarly irrelevant - a single coupled step suffices. A compute-matched comparison reveals that coupling is a \emph{sample-efficiency} mechanism: standard attention trained for 2.4$\times$ longer (matching wall-clock) reaches the same perplexity, but requires 2.4$\times$ more tokens. The advantage scales to 150M ($-$6.7\%) but narrows at 350M ($-$1.0\%), where Differential Attention (18.93) overtakes coupled dynamics (19.35). The benefit is corpus-dependent: coupling helps on domain-coherent text (WikiText-103 $-$6.6\%, PubMed $-$4.5\%) but degrades on heterogeneous web text ($+$10.3\%) and shows no benefit on GLUE. We characterize when coupling helps and when it does not, providing practical guidelines.
BMMay 29, 2025
Representing local protein environments with atomistic foundation modelsMeital Bojan, Sanketh Vedula, Advaith Maddipatla et al.
The local structure of a protein strongly impacts its function and interactions with other molecules. Therefore, a concise, informative representation of a local protein environment is essential for modeling and designing proteins and biomolecular interactions. However, these environments' extensive structural and chemical variability makes them challenging to model, and such representations remain under-explored. In this work, we propose a novel representation for a local protein environment derived from the intermediate features of atomistic foundation models (AFMs). We demonstrate that this embedding effectively captures both local structure (e.g., secondary motifs), and chemical features (e.g., amino-acid identity and protonation state). We further show that the AFM-derived representation space exhibits meaningful structure, enabling the construction of data-driven priors over the distribution of biomolecular environments. Finally, in the context of biomolecular NMR spectroscopy, we demonstrate that the proposed representations enable a first-of-its-kind physics-informed chemical shift predictor that achieves state-of-the-art accuracy. Our results demonstrate the surprising effectiveness of atomistic foundation models and their emergent representations for protein modeling beyond traditional molecular simulations. We believe this will open new lines of work in constructing effective functional representations for protein environments.
SPOct 15, 2024
Data-Driven Cellular Network Selector for Vehicle TeleoperationsBarak Gahtan, Reuven Cohen, Alex M. Bronstein et al.
Remote control of robotic systems, also known as teleoperation, is crucial for the development of autonomous vehicle (AV) technology. It allows a remote operator to view live video from AVs and, in some cases, to make real-time decisions. The effectiveness of video-based teleoperation systems is heavily influenced by the quality of the cellular network and, in particular, its packet loss rate and latency. To optimize these parameters, an AV can be connected to multiple cellular networks and determine in real time over which cellular network each video packet will be transmitted. We present an algorithm, called Active Network Selector (ANS), which uses a time series machine learning approach for solving this problem. We compare ANS to a baseline non-learning algorithm, which is used today in commercial systems, and show that ANS performs much better, with respect to both packet loss and packet latency.
LGAug 8, 2025
Architecture-Aware Generalization Bounds for Temporal Networks: Theory and Fair Comparison MethodologyBarak Gahtan, Alex M. Bronstein
Deep temporal architectures such as Temporal Convolutional Networks (TCNs) achieve strong predictive performance on sequential data, yet theoretical understanding of their generalization remains limited. We address this gap by providing both the first non-vacuous, architecture-aware generalization bounds for deep temporal models and a principled evaluation methodology. For exponentially $β$-mixing sequences, we derive bounds scaling as $ O\!\Bigl(R\,\sqrt{\tfrac{D\,p\,n\,\log N}{N}}\Bigr), $ where $D$ is network depth, $p$ kernel size, $n$ input dimension, and $R$ weight norm. Our delayed-feedback blocking mechanism transforms dependent samples into effectively independent ones while discarding only $O(1/\log N)$ of the data, yielding $\sqrt{D}$ scaling instead of exponential, implying that doubling depth requires approximately quadrupling the training data. We also introduce a fair-comparison methodology that fixes the effective sample size to isolate the effect of temporal structure from information content. Under $N_{\text{eff}}=2{,}000$, strongly dependent sequences ($ρ=0.8$) exhibit $\approx76\%$ smaller generalization gaps than weakly dependent ones ($ρ=0.2$), challenging the intuition that dependence is purely detrimental. Yet convergence rates diverge from theory: weak dependencies follow $N_{\text{eff}}^{-1.21}$ scaling and strong dependencies follow $N_{\text{eff}}^{-0.89}$, both steeper than the predicted $N^{-0.5}$. These findings reveal that temporal dependence can enhance learning under fixed information budgets, while highlighting gaps between theory and practice that motivate future research.
LGApr 30, 2025
From Lab to Wrist: Bridging Metabolic Monitoring and Consumer Wearables for Heart Rate and Oxygen Consumption ModelingBarak Gahtan, Sanketh Vedula, Gil Samuelly Leichtag et al.
Understanding physiological responses during running is critical for performance optimization, tailored training prescriptions, and athlete health management. We introduce a comprehensive framework -- what we believe to be the first capable of predicting instantaneous oxygen consumption (VO$_{2}$) trajectories exclusively from consumer-grade wearable data. Our approach employs two complementary physiological models: (1) accurate modeling of heart rate (HR) dynamics via a physiologically constrained ordinary differential equation (ODE) and neural Kalman filter, trained on over 3 million HR observations, achieving 1-second interval predictions with mean absolute errors as low as 2.81\,bpm (correlation 0.87); and (2) leveraging the principles of precise HR modeling, a novel VO$_{2}$ prediction architecture requiring only the initial second of VO$_{2}$ data for calibration, enabling robust, sequence-to-sequence metabolic demand estimation. Despite relying solely on smartwatch and chest-strap data, our method achieves mean absolute percentage errors of approximately 13\%, effectively capturing rapid physiological transitions and steady-state conditions across diverse running intensities. Our synchronized dataset, complemented by blood lactate measurements, further lays the foundation for future noninvasive metabolic zone identification. By embedding physiological constraints within modern machine learning, this framework democratizes advanced metabolic monitoring, bridging laboratory-grade accuracy and everyday accessibility, thus empowering both elite athletes and recreational fitness enthusiasts.
ROFeb 10, 2022
Towards Predicting Fine Finger Motions from Ultrasound Images via Kinematic RepresentationDean Zadok, Oren Salzman, Alon Wolf et al.
A central challenge in building robotic prostheses is the creation of a sensor-based system able to read physiological signals from the lower limb and instruct a robotic hand to perform various tasks. Existing systems typically perform discrete gestures such as pointing or grasping, by employing electromyography (EMG) or ultrasound (US) technologies to analyze muscle states. While estimating finger gestures has been done in the past by detecting prominent gestures, we are interested in detection, or inference, done in the context of fine motions that evolve over time. Examples include motions occurring when performing fine and dexterous tasks such as keyboard typing or piano playing. We consider this task as an important step towards higher adoption rates of robotic prostheses among arm amputees, as it has the potential to dramatically increase functionality in performing daily tasks. To this end, we present an end-to-end robotic system, which can successfully infer fine finger motions. This is achieved by modeling the hand as a robotic manipulator and using it as an intermediate representation to encode muscles' dynamics from a sequence of US images. We evaluated our method by collecting data from a group of subjects and demonstrating how it can be used to replay music played or text typed. To the best of our knowledge, this is the first study demonstrating these downstream tasks within an end-to-end system.
QUANT-PHDec 11, 2021
SPDCinv: Inverse Quantum-Optical Design of High-Dimensional QuditsEyal Rozenberg, Aviv Karnieli, Ofir Yesharim et al.
Spontaneous parametric down-conversion in quantum optics is an invaluable resource for the realization of high-dimensional qudits with spatial modes of light. One of the main open challenges is how to directly generate a desirable qudit state in the SPDC process. This problem can be addressed through advanced computational learning methods; however, due to difficulties in modeling the SPDC process by a fully differentiable algorithm that takes into account all interaction effects, progress has been limited. Here, we overcome these limitations and introduce a physically-constrained and differentiable model, validated against experimental results for shaped pump beams and structured crystals, capable of learning every interaction parameter in the process. We avoid any restrictions induced by the stochastic nature of our physical model and integrate the dynamic equations governing the evolution under the SPDC Hamiltonian. We solve the inverse problem of designing a nonlinear quantum optical system that achieves the desired quantum state of down-converted photon pairs. The desired states are defined using either the second-order correlations between different spatial modes or by specifying the required density matrix. By learning nonlinear volume holograms as well as different pump shapes, we successfully show how to generate maximally entangled states. Furthermore, we simulate all-optical coherent control over the generated quantum state by actively changing the profile of the pump beam. Our work can be useful for applications such as novel designs of high-dimensional quantum key distribution and quantum information processing protocols. In addition, our method can be readily applied for controlling other degrees of freedom of light in the SPDC process, such as the spectral and temporal properties, and may even be used in condensed-matter systems having a similar interaction Hamiltonian.
QUANT-PHFeb 20, 2021
Inverse Design of Quantum Holograms in Three-Dimensional Nonlinear Photonic CrystalsEyal Rozenberg, Aviv Karnieli, Ofir Yesharim et al.
We introduce a systematic approach for designing 3D nonlinear photonic crystals and pump beams for generating desired quantum correlations between structured photon-pairs. Our model is fully differentiable, allowing accurate and efficient learning and discovery of novel designs.
CVDec 16, 2020
SimuGAN: Unsupervised forward modeling and optimal design of a LIDAR CameraNir Diamant, Tal Mund, Ohad Menashe et al.
Energy-saving LIDAR camera for short distances estimates an object's distance using temporally intensity-coded laser light pulses and calculates the maximum correlation with the back-scattered pulse. Though on low power, the backs-scattered pulse is noisy and unstable, which leads to inaccurate and unreliable depth estimation. To address this problem, we use GANs (Generative Adversarial Networks), which are two neural networks that can learn complicated class distributions through an adversarial process. We learn the LIDAR camera's hidden properties and behavior, creating a novel, fully unsupervised forward model that simulates the camera. Then, we use the model's differentiability to explore the camera parameter space and optimize those parameters in terms of depth, accuracy, and stability. To achieve this goal, we also propose a new custom loss function designated to the back-scattered code distribution's weaknesses and its circular behavior. The results are demonstrated on both synthetic and real data.
CVDec 8, 2020
Digital Gimbal: End-to-end Deep Image Stabilization with Learnable Exposure TimesOmer Dahary, Matan Jacoby, Alex M. Bronstein
Mechanical image stabilization using actuated gimbals enables capturing long-exposure shots without suffering from blur due to camera motion. These devices, however, are often physically cumbersome and expensive, limiting their widespread use. In this work, we propose to digitally emulate a mechanically stabilized system from the input of a fast unstabilized camera. To exploit the trade-off between motion blur at long exposures and low SNR at short exposures, we train a CNN that estimates a sharp high-SNR image by aggregating a burst of noisy short-exposure frames, related by unknown motion. We further suggest learning the burst's exposure times in an end-to-end manner, thus balancing the noise and blur across the frames. We demonstrate this method's advantage over the traditional approach of deblurring a single image or denoising a fixed-exposure burst on both synthetic and real data.
LGJun 1, 2020
Data-Driven Prediction of Embryo Implantation Probability Using IVF Time-lapse ImagingDavid H. Silver, Martin Feder, Yael Gold-Zamir et al.
The process of fertilizing a human egg outside the body in order to help those suffering from infertility to conceive is known as in vitro fertilization (IVF). Despite being the most effective method of assisted reproductive technology (ART), the average success rate of IVF is a mere 20-40%. One step that is critical to the success of the procedure is selecting which embryo to transfer to the patient, a process typically conducted manually and without any universally accepted and standardized criteria. In this paper we describe a novel data-driven system trained to directly predict embryo implantation probability from embryogenesis time-lapse imaging videos. Using retrospectively collected videos from 272 embryos, we demonstrate that, when compared to an external panel of embryologists, our algorithm results in a 12% increase of positive predictive value and a 29% increase of negative predictive value.
LGApr 19, 2020
HCM: Hardware-Aware Complexity Metric for Neural Network ArchitecturesAlex Karbachevsky, Chaim Baskin, Evgenii Zheltonozhskii et al.
Convolutional Neural Networks (CNNs) have become common in many fields including computer vision, speech recognition, and natural language processing. Although CNN hardware accelerators are already included as part of many SoC architectures, the task of achieving high accuracy on resource-restricted devices is still considered challenging, mainly due to the vast number of design parameters that need to be balanced to achieve an efficient solution. Quantization techniques, when applied to the network parameters, lead to a reduction of power and area and may also change the ratio between communication and computation. As a result, some algorithmic solutions may suffer from lack of memory bandwidth or computational resources and fail to achieve the expected performance due to hardware constraints. Thus, the system designer and the micro-architect need to understand at early development stages the impact of their high-level decisions (e.g., the architecture of the CNN and the amount of bits used to represent its parameters) on the final product (e.g., the expected power saving, area, and accuracy). Unfortunately, existing tools fall short of supporting such decisions. This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures, through the entire project lifetime (especially at its early stages) by predicting the impact of architectural and micro-architectural decisions on the final product. We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices such as real-time embedded systems, and to avoid making design mistakes at early stages.
LGMar 4, 2020
Colored Noise Injection for Training Adversarially Robust Neural NetworksEvgenii Zheltonozhskii, Chaim Baskin, Yaniv Nemcovsky et al.
Even though deep learning has shown unmatched performance on various tasks, neural networks have been shown to be vulnerable to small adversarial perturbations of the input that lead to significant performance degradation. In this work we extend the idea of adding white Gaussian noise to the network weights and activations during adversarial training (PNI) to the injection of colored noise for defense against common white-box and black-box attacks. We show that our approach outperforms PNI and various previous approaches in terms of adversarial accuracy on CIFAR-10 and CIFAR-100 datasets. In addition, we provide an extensive ablation study of the proposed method justifying the chosen configurations.
CVJun 5, 2019
Baby steps towards few-shot learning with multiple semanticsEli Schwartz, Leonid Karlinsky, Rogerio Feris et al.
Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems. While considerable progress has been achieved in few-shot learning from a few image examples, much less attention has been given to the verbal descriptions that are usually provided to infants when they are presented with a new object. In this paper, we focus on the role of additional semantics that can significantly facilitate few-shot visual learning. Building upon recent advances in few-shot learning with additional semantic information, we demonstrate that further improvements are possible by combining multiple and richer semantics (category labels, attributes, and natural language descriptions). Using these ideas, we offer the community new results on the popular miniImageNet and CUB few-shot benchmarks, comparing favorably to the previous state-of-the-art results for both visual only and visual plus semantics-based approaches. We also performed an ablation study investigating the components and design choices of our approach.
CVFeb 26, 2019
LaSO: Label-Set Operations networks for multi-label few-shot learningAmit Alfassy, Leonid Karlinsky, Amit Aides et al.
Example synthesis is one of the leading methods to tackle the problem of few-shot learning, where only a small number of samples per class are available. However, current synthesis approaches only address the scenario of a single category label per image. In this work, we propose a novel technique for synthesizing samples with multiple labels for the (yet unhandled) multi-label few-shot classification scenario. We propose to combine pairs of given examples in feature space, so that the resulting synthesized feature vectors will correspond to examples whose label sets are obtained through certain set operations on the label sets of the corresponding input pairs. Thus, our method is capable of producing a sample containing the intersection, union or set-difference of labels present in two input samples. As we show, these set operations generalize to labels unseen during training. This enables performing augmentation on examples of novel categories, thus, facilitating multi-label few-shot classifier learning. We conduct numerous experiments showing promising results for the label-set manipulation capabilities of the proposed approach, both directly (using the classification and retrieval metrics), and in the context of performing data augmentation for multi-label few-shot learning. We propose a benchmark for this new and challenging task and show that our method compares favorably to all the common baselines.
CVFeb 7, 2019
Beholder-GAN: Generation and Beautification of Facial Images with Conditioning on Their Beauty LevelNir Diamant, Dean Zadok, Chaim Baskin et al.
Beauty is in the eye of the beholder. This maxim, emphasizing the subjectivity of the perception of beauty, has enjoyed a wide consensus since ancient times. In the digitalera, data-driven methods have been shown to be able to predict human-assigned beauty scores for facial images. In this work, we augment this ability and train a generative model that generates faces conditioned on a requested beauty score. In addition, we show how this trained generator can be used to beautify an input face image. By doing so, we achieve an unsupervised beautification model, in the sense that it relies on no ground truth target images.
LGNov 27, 2018
Efficient non-uniform quantizer for quantized neural network targeting reconfigurable hardwareNatan Liss, Chaim Baskin, Avi Mendelson et al.
Convolutional Neural Networks (CNN) has become more popular choice for various tasks such as computer vision, speech recognition and natural language processing. Thanks to their large computational capability and throughput, GPUs ,which are not power efficient and therefore does not suit low power systems such as mobile devices, are the most common platform for both training and inferencing tasks. Recent studies has shown that FPGAs can provide a good alternative to GPUs as a CNN accelerator, due to their re-configurable nature, low power and small latency. In order for FPGA-based accelerators outperform GPUs in inference task, both the parameters of the network and the activations must be quantized. While most works use uniform quantizers for both parameters and activations, it is not always the optimal one, and a non-uniform quantizer need to be considered. In this work we introduce a custom hardware-friendly approach to implement non-uniform quantizers. In addition, we use a single scale integer representation of both parameters and activations, for both training and inference. The combined method yields a hardware efficient non-uniform quantizer, fit for real-time applications. We have tested our method on CIFAR-10 and CIFAR-100 image classification datasets with ResNet-18 and VGG-like architectures, and saw little degradation in accuracy.
CVAug 23, 2018
High frame-rate cardiac ultrasound imaging with deep learningOrtal Senouf, Sanketh Vedula, Grigoriy Zurakhov et al.
Cardiac ultrasound imaging requires a high frame rate in order to capture rapid motion. This can be achieved by multi-line acquisition (MLA), where several narrow-focused received lines are obtained from each wide-focused transmitted line. This shortens the acquisition time at the expense of introducing block artifacts. In this paper, we propose a data-driven learning-based approach to improve the MLA image quality. We train an end-to-end convolutional neural network on pairs of real ultrasound cardiac data, acquired through MLA and the corresponding single-line acquisition (SLA). The network achieves a significant improvement in image quality for both $5-$ and $7-$line MLA resulting in a decorrelation measure similar to that of SLA while having the frame rate of MLA.
CVAug 23, 2018
High quality ultrasonic multi-line transmission through deep learningSanketh Vedula, Ortal Senouf, Grigoriy Zurakhov et al.
Frame rate is a crucial consideration in cardiac ultrasound imaging and 3D sonography. Several methods have been proposed in the medical ultrasound literature aiming at accelerating the image acquisition. In this paper, we consider one such method called \textit{multi-line transmission} (MLT), in which several evenly separated focused beams are transmitted simultaneously. While MLT reduces the acquisition time, it comes at the expense of a heavy loss of contrast due to the interactions between the beams (cross-talk artifact). In this paper, we introduce a data-driven method to reduce the artifacts arising in MLT. To this end, we propose to train an end-to-end convolutional neural network consisting of correction layers followed by a constant apodization layer. The network is trained on pairs of raw data obtained through MLT and the corresponding \textit{single-line transmission} (SLT) data. Experimental evaluation demonstrates significant improvement both in the visual image quality and in objective measures such as contrast ratio and contrast-to-noise ratio, while preserving resolution unlike traditional apodization-based methods. We show that the proposed method is able to generalize well across different patients and anatomies on real and phantom data.
CVAug 20, 2018
Class-Aware Fully-Convolutional Gaussian and Poisson DenoisingTal Remez, Or Litany, Raja Giryes et al.
We propose a fully-convolutional neural-network architecture for image denoising which is simple yet powerful. Its structure allows to exploit the gradual nature of the denoising process, in which shallow layers handle local noise statistics, while deeper layers recover edges and enhance textures. Our method advances the state-of-the-art when trained for different noise levels and distributions (both Gaussian and Poisson). In addition, we show that making the denoiser class-aware by exploiting semantic class information boosts performance, enhances textures and reduces artifacts.
CVJun 12, 2018
Delta-encoder: an effective sample synthesis method for few-shot object recognitionEli Schwartz, Leonid Karlinsky, Joseph Shtok et al.
Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we proposes a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted Delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or "deltas", between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves over the state-of-the-art in one-shot object-recognition and compares favorably in the few-shot case. Upon acceptance code will be made available.
CVJun 12, 2018
RepMet: Representative-based metric learning for classification and one-shot object detectionLeonid Karlinsky, Joseph Shtok, Sivan Harary et al.
Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples. In this work, we propose a new method for DML that simultaneously learns the backbone network parameters, the embedding space, and the multi-modal distribution of each of the training categories in that space, in a single end-to-end training process. Our approach outperforms state-of-the-art methods for DML-based object classification on a variety of standard fine-grained datasets. Furthermore, we demonstrate the effectiveness of our approach on the problem of few-shot object detection, by incorporating the proposed DML architecture as a classification head into a standard object detection model. We achieve the best results on the ImageNet-LOC dataset compared to strong baselines, when only a few training examples are available. We also offer the community a new episodic benchmark based on the ImageNet dataset for the few-shot object detection task.
LGApr 29, 2018
UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural NetworksChaim Baskin, Eli Schwartz, Evgenii Zheltonozhskii et al.
We present a novel method for neural network quantization that emulates a non-uniform $k$-quantile quantizer, which adapts to the distribution of the quantized parameters. Our approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We suggest to compare the results as a function of the bit-operations (BOPS) performed, assuming a look-up table availability for the non-uniform case. In this setup, we show the advantages of our strategy in the low computational budget regime. While the proposed solution is harder to implement in hardware, we believe it sets a basis for new alternatives to neural networks quantization.
IVJan 20, 2018
DeepISP: Towards Learning an End-to-End Image Processing PipelineEli Schwartz, Raja Giryes, Alex M. Bronstein
We present DeepISP, a full end-to-end deep neural model of the camera image signal processing (ISP) pipeline. Our model learns a mapping from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks such as demosaicing and denoising as well as higher-level tasks such as color correction and image adjustment. The training and evaluation of the pipeline were performed on a dedicated dataset containing pairs of low-light and well-lit images captured by a Samsung S7 smartphone camera in both raw and processed JPEG formats. The proposed solution achieves state-of-the-art performance in objective evaluation of PSNR on the subtask of joint denoising and demosaicing. For the full end-to-end pipeline, it achieves better visual quality compared to the manufacturer ISP, in both a subjective human assessment and when rated by a deep model trained for assessing image quality.
CVOct 17, 2017
Towards CT-quality Ultrasound Imaging using Deep LearningSanketh Vedula, Ortal Senouf, Alex M. Bronstein et al.
The cost-effectiveness and practical harmlessness of ultrasound imaging have made it one of the most widespread tools for medical diagnosis. Unfortunately, the beam-forming based image formation produces granular speckle noise, blurring, shading and other artifacts. To overcome these effects, the ultimate goal would be to reconstruct the tissue acoustic properties by solving a full wave propagation inverse problem. In this work, we make a step towards this goal, using Multi-Resolution Convolutional Neural Networks (CNN). As a result, we are able to reconstruct CT-quality images from the reflected ultrasound radio-frequency(RF) data obtained by simulation from real CT scans of a human body. We also show that CNN is able to imitate existing computationally heavy despeckling methods, thereby saving orders of magnitude in computations and making them amenable to real-time applications.
CVApr 27, 2017
Deep Functional Maps: Structured Prediction for Dense Shape CorrespondenceOr Litany, Tal Remez, Emanuele Rodolà et al.
We introduce a new framework for learning dense correspondence between deformable 3D shapes. Existing learning based approaches model shape correspondence as a labelling problem, where each point of a query shape receives a label identifying a point on some reference domain; the correspondence is then constructed a posteriori by composing the label predictions of two input shapes. We propose a paradigm shift and design a structured prediction model in the space of functional maps, linear operators that provide a compact representation of the correspondence. We model the learning process via a deep residual network which takes dense descriptor fields defined on two shapes as input, and outputs a soft map between the two given objects. The resulting correspondence is shown to be accurate on several challenging benchmarks comprising multiple categories, synthetic models, real scans with acquisition artifacts, topological noise, and partiality.
CVJan 6, 2017
Deep Class Aware DenoisingTal Remez, Or Litany, Raja Giryes et al.
The increasing demand for high image quality in mobile devices brings forth the need for better computational enhancement techniques, and image denoising in particular. At the same time, the images captured by these devices can be categorized into a small set of semantic classes. However simple, this observation has not been exploited in image denoising until now. In this paper, we demonstrate how the reconstruction quality improves when a denoiser is aware of the type of content in the image. To this end, we first propose a new fully convolutional deep neural network architecture which is simple yet powerful as it achieves state-of-the-art performance even without being class-aware. We further show that a significant boost in performance of up to $0.4$ dB PSNR can be achieved by making our network class-aware, namely, by fine-tuning it for images belonging to a specific semantic class. Relying on the hugely successful existing image classifiers, this research advocates for using a class-aware approach in all image enhancement tasks.
CVJan 6, 2017
Deep Convolutional Denoising of Low-Light ImagesTal Remez, Or Litany, Raja Giryes et al.
Poisson distribution is used for modeling noise in photon-limited imaging. While canonical examples include relatively exotic types of sensing like spectral imaging or astronomy, the problem is relevant to regular photography now more than ever due to the booming market for mobile cameras. Restricted form factor limits the amount of absorbed light, thus computational post-processing is called for. In this paper, we make use of the powerful framework of deep convolutional neural networks for Poisson denoising. We demonstrate how by training the same network with images having a specific peak value, our denoiser outperforms previous state-of-the-art by a large margin both visually and quantitatively. Being flexible and data-driven, our solution resolves the heavy ad hoc engineering used in previous methods and is an order of magnitude faster. We further show that by adding a reasonable prior on the class of the image being processed, another significant boost in performance is achieved.
NAMay 30, 2016
Tradeoffs between Convergence Speed and Reconstruction Accuracy in Inverse ProblemsRaja Giryes, Yonina C. Eldar, Alex M. Bronstein et al.
Solving inverse problems with iterative algorithms is popular, especially for large data. Due to time constraints, the number of possible iterations is usually limited, potentially affecting the achievable accuracy. Given an error one is willing to tolerate, an important question is whether it is possible to modify the original iterations to obtain faster convergence to a minimizer achieving the allowed error without increasing the computational cost of each iteration considerably. Relying on recent recovery techniques developed for settings in which the desired signal belongs to some low-dimensional set, we show that using a coarse estimate of this set may lead to faster convergence at the cost of an additional reconstruction error related to the accuracy of the set approximation. Our theory ties to recent advances in sparse recovery, compressed sensing, and deep learning. Particularly, it may provide a possible explanation to the successful approximation of the l1-minimization solution by neural networks with layers representing iterations, as practiced in the learned iterative shrinkage-thresholding algorithm (LISTA).
NEApr 30, 2015
Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?Raja Giryes, Guillermo Sapiro, Alex M. Bronstein
Three important properties of a classification machinery are: (i) the system preserves the core information of the input data; (ii) the training examples convey information about unseen data; and (iii) the system is able to treat differently points from different classes. In this work we show that these fundamental properties are satisfied by the architecture of deep neural networks. We formally prove that these networks with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data. Similar points at the input of the network are likely to have a similar output. The theoretical analysis of deep networks here presented exploits tools used in the compressed sensing and dictionary learning literature, thereby making a formal connection between these important topics. The derived results allow drawing conclusions on the metric learning properties of the network and their relation to its structure, as well as providing bounds on the required size of the training set such that the training examples would represent faithfully the unseen data. The results are validated with state-of-the-art trained networks.
MLDec 18, 2014
On the Stability of Deep NetworksRaja Giryes, Guillermo Sapiro, Alex M. Bronstein
In this work we study the properties of deep neural networks (DNN) with random weights. We formally prove that these networks perform a distance-preserving embedding of the data. Based on this we then draw conclusions on the size of the training data and the networks' structure. A longer version of this paper with more results and details can be found in (Giryes et al., 2015). In particular, we formally prove in the longer version that DNN with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data.
CVDec 19, 2013
Sparse similarity-preserving hashingJonathan Masci, Alex M. Bronstein, Michael M. Bronstein et al.
In recent years, a lot of attention has been devoted to efficient nearest neighbor search by means of similarity-preserving hashing. One of the plights of existing hashing techniques is the intrinsic trade-off between performance and computational complexity: while longer hash codes allow for lower false positive rates, it is very difficult to increase the embedding dimensionality without incurring in very high false negatives rates or prohibiting computational costs. In this paper, we propose a way to overcome this limitation by enforcing the hash codes to be sparse. Sparse high-dimensional codes enjoy from the low false positive rates typical of long hashes, while keeping the false negative rates similar to those of a shorter dense hashing scheme with equal number of degrees of freedom. We use a tailored feed-forward neural network for the hashing function. Extensive experimental evaluation involving visual and multi-modal data shows the benefits of the proposed method.
LGSep 27, 2012
Learning Robust Low-Rank RepresentationsPablo Sprechmann, Alex M. Bronstein, Guillermo Sapiro
In this paper we present a comprehensive framework for learning robust low-rank representations by combining and extending recent ideas for learning fast sparse coding regressors with structured non-convex optimization techniques. This approach connects robust principal component analysis (RPCA) with dictionary learning techniques and allows its approximation via trainable encoders. We propose an efficient feed-forward architecture derived from an optimization algorithm designed to exactly solve robust low dimensional projections. This architecture, in combination with different training objective functions, allows the regressors to be used as online approximants of the exact offline RPCA problem or as RPCA-based neural networks. Simple modifications of these encoders can handle challenging extensions, such as the inclusion of geometric data transformations. We present several examples with real data from image, audio, and video processing. When used to approximate RPCA, our basic implementation shows several orders of magnitude speedup compared to the exact solvers with almost no performance degradation. We show the strength of the inclusion of learning to the RPCA approach on a music source separation application, where the encoders outperform the exact RPCA algorithms, which are already reported to produce state-of-the-art results on a benchmark database. Our preliminary implementation on an iPad shows faster-than-real-time performance with minimal latency.