Daniel Freedman

CV
h-index46
28papers
855citations
Novelty56%
AI Score46

28 Papers

CVNov 28, 2022
What's Behind the Mask: Estimating Uncertainty in Image-to-Image Problems

Gilad Kutiel, Regev Cohen, Michael Elad et al.

Estimating uncertainty in image-to-image networks is an important task, particularly as such networks are being increasingly deployed in the biological and medical imaging realms. In this paper, we introduce a new approach to this problem based on masking. Given an existing image-to-image network, our approach computes a mask such that the distance between the masked reconstructed image and the masked true image is guaranteed to be less than a specified threshold, with high probability. The mask thus identifies the more certain regions of the reconstructed image. Our approach is agnostic to the underlying image-to-image network, and only requires triples of the input (degraded), reconstructed and true images for training. Furthermore, our method is agnostic to the distance metric used. As a result, one can use $L_p$-style distances or perceptual distances like LPIPS, which contrasts with interval-based approaches to uncertainty. Our theoretical guarantees derive from a conformal calibration procedure. We evaluate our mask-based approach to uncertainty on image colorization, image completion, and super-resolution tasks, demonstrating high quality performance on each.

CVAug 23, 2023
Self-Supervised Learning for Endoscopic Video Analysis

Roy Hirsch, Mathilde Caron, Regev Cohen et al.

Self-supervised learning (SSL) has led to important breakthroughs in computer vision by allowing learning from large amounts of unlabeled data. As such, it might have a pivotal role to play in biomedicine where annotating data requires a highly specialized expertise. Yet, there are many healthcare domains for which SSL has not been extensively explored. One such domain is endoscopy, minimally invasive procedures which are commonly used to detect and treat infections, chronic inflammatory diseases or cancer. In this work, we study the use of a leading SSL framework, namely Masked Siamese Networks (MSNs), for endoscopic video analysis such as colonoscopy and laparoscopy. To fully exploit the power of SSL, we create sizable unlabeled endoscopic video datasets for training MSNs. These strong image representations serve as a foundation for secondary training with limited annotated datasets, resulting in state-of-the-art performance in endoscopic benchmarks like surgical phase recognition during laparoscopy and colonoscopic polyp characterization. Additionally, we achieve a 50% reduction in annotated data size without sacrificing performance. Thus, our work provides evidence that SSL can dramatically reduce the need of annotated data in endoscopy.

LGNov 13, 2025Code
SVD-NO: Learning PDE Solution Operators with SVD Integral Kernels

Noam Koren, Ralf J. J. Mackenbach, Ruud J. G. van Sloun et al.

Neural operators have emerged as a promising paradigm for learning solution operators of partial differential equa- tions (PDEs) directly from data. Existing methods, such as those based on Fourier or graph techniques, make strong as- sumptions about the structure of the kernel integral opera- tor, assumptions which may limit expressivity. We present SVD-NO, a neural operator that explicitly parameterizes the kernel by its singular-value decomposition (SVD) and then carries out the integral directly in the low-rank basis. Two lightweight networks learn the left and right singular func- tions, a diagonal parameter matrix learns the singular values, and a Gram-matrix regularizer enforces orthonormality. As SVD-NO approximates the full kernel, it obtains a high de- gree of expressivity. Furthermore, due to its low-rank struc- ture the computational complexity of applying the operator remains reasonable, leading to a practical system. In exten- sive evaluations on five diverse benchmark equations, SVD- NO achieves a new state of the art. In particular, SVD-NO provides greater performance gains on PDEs whose solutions are highly spatially variable. The code of this work is publicly available at https://github.com/2noamk/SVDNO.git.

CVSep 27, 2022
RepsNet: Combining Vision with Language for Automated Medical Reports

Ajay Kumar Tanwani, Joelle Barral, Daniel Freedman

Writing reports by analyzing medical images is error-prone for inexperienced practitioners and time consuming for experienced ones. In this work, we present RepsNet that adapts pre-trained vision and language models to interpret medical images and generate automated reports in natural language. RepsNet consists of an encoder-decoder model: the encoder aligns the images with natural language descriptions via contrastive learning, while the decoder predicts answers by conditioning on encoded images and prior context of descriptions retrieved by nearest neighbor search. We formulate the problem in a visual question answering setting to handle both categorical and descriptive natural language answers. We perform experiments on two challenging tasks of medical visual question answering (VQA-Rad) and report generation (IU-Xray) on radiology image datasets. Results show that RepsNet outperforms state-of-the-art methods with 81.08 % classification accuracy on VQA-Rad 2018 and 0.58 BLEU-1 score on IU-Xray. Supplementary details are available at https://sites.google.com/view/repsnet

QUANT-PHApr 13, 2023
Designing Nonlinear Photonic Crystals for High-Dimensional Quantum State Engineering

Eyal Rozenberg, Aviv Karnieli, Ofir Yesharim et al.

We propose a novel, physically-constrained and differentiable approach for the generation of D-dimensional qudit states via spontaneous parametric down-conversion (SPDC) in quantum optics. We circumvent any limitations imposed by the inherently stochastic nature of the physical process and incorporate a set of stochastic dynamical equations governing its evolution under the SPDC Hamiltonian. We demonstrate the effectiveness of our model through the design of structured nonlinear photonic crystals (NLPCs) and shaped pump beams; and show, theoretically and experimentally, how to generate maximally entangled states in the spatial degree of freedom. The learning of NLPC structures offers a promising new avenue for shaping and controlling arbitrary quantum states and enables all-optical coherent control of the generated states. We believe that this approach can readily be extended from bulky crystals to thin Metasurfaces and potentially applied to other quantum systems sharing a similar Hamiltonian structures, such as superfluids and superconductors.

CVOct 26, 2023
Weakly-Supervised Surgical Phase Recognition

Roy Hirsch, Regev Cohen, Mathilde Caron et al.

A key element of computer-assisted surgery systems is phase recognition of surgical videos. Existing phase recognition algorithms require frame-wise annotation of a large number of videos, which is time and money consuming. In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction. Furthermore, we utilize within our method two forms of weak supervision: sparse timestamps or few-shot learning. The proposed algorithm enjoys low complexity and can operate in lowdata regimes. We validate our method by running experiments with the public Cholec80 dataset of laparoscopic cholecystectomy videos, demonstrating promising performance in multiple setups.

LGNov 9, 2022
Semi-Equivariant Continuous Normalizing Flows for Target-Aware Molecule Generation

Eyal Rozenberg, Daniel Freedman

We propose an algorithm for learning a conditional generative model of a molecule given a target. Specifically, given a receptor molecule that one wishes to bind to, the conditional model generates candidate ligand molecules that may bind to it. The distribution should be invariant to rigid body transformations that act $\textit{jointly}$ on the ligand and the receptor; it should also be invariant to permutations of either the ligand or receptor atoms. Our learning algorithm is based on a continuous normalizing flow. We establish semi-equivariance conditions on the flow which guarantee the aforementioned invariance conditions on the conditional distribution. We propose a graph neural network architecture which implements this flow, and which is designed to learn effectively despite the vast differences in size between the ligand and receptor. We evaluate our method on the CrossDocked2020 dataset, attaining a significant improvement in binding affinity over competing methods.

LGApr 13, 2023
Semi-Equivariant Conditional Normalizing Flows

Eyal Rozenberg, Daniel Freedman

We study the problem of learning conditional distributions of the form $p(G | \hat G)$, where $G$ and $\hat G$ are two 3D graphs, using continuous normalizing flows. We derive a semi-equivariance condition on the flow which ensures that conditional invariance to rigid motions holds. We demonstrate the effectiveness of the technique in the molecular setting of receptor-aware ligand generation.

LGFeb 1, 2024
Early Time Classification with Accumulated Accuracy Gap Control

Liran Ringel, Regev Cohen, Daniel Freedman et al.

Early time classification algorithms aim to label a stream of features without processing the full input stream, while maintaining accuracy comparable to that achieved by applying the classifier to the entire input. In this paper, we introduce a statistical framework that can be applied to any sequential classifier, formulating a calibrated stopping rule. This data-driven rule attains finite-sample, distribution-free control of the accuracy gap between full and early-time classification. We start by presenting a novel method that builds on the Learn-then-Test calibration framework to control this gap marginally, on average over i.i.d. instances. As this algorithm tends to yield an excessively high accuracy gap for early halt times, our main contribution is the proposal of a framework that controls a stronger notion of error, where the accuracy gap is controlled conditionally on the accumulated halt times. Numerical experiments demonstrate the effectiveness, applicability, and usefulness of our method. We show that our proposed early stopping mechanism reduces up to 94% of timesteps used for classification while achieving rigorous accuracy gap control.

SDFeb 19, 2024
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models

Miri Varshavsky-Hassid, Roy Hirsch, Regev Cohen et al.

The incorporation of Denoising Diffusion Models (DDMs) in the Text-to-Speech (TTS) domain is rising, providing great value in synthesizing high quality speech. Although they exhibit impressive audio quality, the extent of their semantic capabilities is unknown, and controlling their synthesized speech's vocal properties remains a challenge. Inspired by recent advances in image synthesis, we explore the latent space of frozen TTS models, which is composed of the latent bottleneck activations of the DDM's denoiser. We identify that this space contains rich semantic information, and outline several novel methods for finding semantic directions within it, both supervised and unsupervised. We then demonstrate how these enable off-the-shelf audio editing, without any further training, architectural changes or data requirements. We present evidence of the semantic and acoustic qualities of the edited audio, and provide supplemental samples: https://latent-analysis-grad-tts.github.io/speech-samples/.

SIFeb 4, 2024
Overcoming Order in Autoregressive Graph Generation

Edo Cohen-Karlik, Eyal Rozenberg, Daniel Freedman

Graph generation is a fundamental problem in various domains, including chemistry and social networks. Recent work has shown that molecular graph generation using recurrent neural networks (RNNs) is advantageous compared to traditional generative approaches which require converting continuous latent representations into graphs. One issue which arises when treating graph generation as sequential generation is the arbitrary order of the sequence which results from a particular choice of graph flattening method. In this work we propose using RNNs, taking into account the non-sequential nature of graphs by adding an Orderless Regularization (OLR) term that encourages the hidden state of the recurrent model to be invariant to different valid orderings present under the training distribution. We demonstrate that sequential graph generation models benefit from our proposed regularization scheme, especially when data is scarce. Our findings contribute to the growing body of research on graph generation and provide a valuable tool for various applications requiring the synthesis of realistic and diverse graph structures.

CLAug 4, 2025
Clinically Grounded Agent-based Report Evaluation: An Interpretable Metric for Radiology Report Generation

Radhika Dua, Young Joon, Kwon et al.

Radiological imaging is central to diagnosis, treatment planning, and clinical decision-making. Vision-language foundation models have spurred interest in automated radiology report generation (RRG), but safe deployment requires reliable clinical evaluation of generated reports. Existing metrics often rely on surface-level similarity or behave as black boxes, lacking interpretability. We introduce ICARE (Interpretable and Clinically-grounded Agent-based Report Evaluation), an interpretable evaluation framework leveraging large language model agents and dynamic multiple-choice question answering (MCQA). Two agents, each with either the ground-truth or generated report, generate clinically meaningful questions and quiz each other. Agreement on answers captures preservation and consistency of findings, serving as interpretable proxies for clinical precision and recall. By linking scores to question-answer pairs, ICARE enables transparent, and interpretable assessment. Clinician studies show ICARE aligns significantly more with expert judgment than prior metrics. Perturbation analyses confirm sensitivity to clinical content and reproducibility, while model comparisons reveal interpretable error patterns.

LGDec 2, 2024
ReHub: Linear Complexity Graph Transformers with Adaptive Hub-Spoke Reassignment

Tomer Borreda, Daniel Freedman, Or Litany

We present ReHub, a novel graph transformer architecture that achieves linear complexity through an efficient reassignment technique between nodes and virtual nodes. Graph transformers have become increasingly important in graph learning for their ability to utilize long-range node communication explicitly, addressing limitations such as oversmoothing and oversquashing found in message-passing graph networks. However, their dense attention mechanism scales quadratically with the number of nodes, limiting their applicability to large-scale graphs. ReHub draws inspiration from the airline industry's hub-and-spoke model, where flights are assigned to optimize operational efficiency. In our approach, graph nodes (spokes) are dynamically reassigned to a fixed number of virtual nodes (hubs) at each model layer. Recent work, Neural Atoms (Li et al., 2024), has demonstrated impressive and consistent improvements over GNN baselines by utilizing such virtual nodes; their findings suggest that the number of hubs strongly influences performance. However, increasing the number of hubs typically raises complexity, requiring a trade-off to maintain linear complexity. Our key insight is that each node only needs to interact with a small subset of hubs to achieve linear complexity, even when the total number of hubs is large. To leverage all hubs without incurring additional computational costs, we propose a simple yet effective adaptive reassignment technique based on hub-hub similarity scores, eliminating the need for expensive node-hub computations. Our experiments on LRGB indicate a consistent improvement in results over the base method, Neural Atoms, while maintaining a linear complexity. Remarkably, our sparse model achieves performance on par with its non-sparse counterpart. Furthermore, ReHub outperforms competitive baselines and consistently ranks among top performers across various benchmarks.

LGJun 26, 2024
Molecular Diffusion Models with Virtual Receptors

Matan Halfon, Eyal Rozenberg, Ehud Rivlin et al.

Machine learning approaches to Structure-Based Drug Design (SBDD) have proven quite fertile over the last few years. In particular, diffusion-based approaches to SBDD have shown great promise. We present a technique which expands on this diffusion approach in two crucial ways. First, we address the size disparity between the drug molecule and the target/receptor, which makes learning more challenging and inference slower. We do so through the notion of a Virtual Receptor, which is a compressed version of the receptor; it is learned so as to preserve key aspects of the structural information of the original receptor, while respecting the relevant group equivariance. Second, we incorporate a protein language embedding used originally in the context of protein folding. We experimentally demonstrate the contributions of both the virtual receptors and the protein embeddings: in practice, they lead to both better performance, as well as significantly faster computations.

CVMay 17, 2023
Principal Uncertainty Quantification with Spatial Correlation for Image Restoration Problems

Omer Belhasin, Yaniv Romano, Daniel Freedman et al.

Uncertainty quantification for inverse problems in imaging has drawn much attention lately. Existing approaches towards this task define uncertainty regions based on probable values per pixel, while ignoring spatial correlations within the image, resulting in an exaggerated volume of uncertainty. In this paper, we propose PUQ (Principal Uncertainty Quantification) -- a novel definition and corresponding analysis of uncertainty regions that takes into account spatial relationships within the image, thus providing reduced volume regions. Using recent advancements in generative models, we derive uncertainty intervals around principal components of the empirical posterior distribution, forming an ambiguity region that guarantees the inclusion of true unseen values with a user-defined confidence probability. To improve computational efficiency and interpretability, we also guarantee the recovery of true unseen values using only a few principal directions, resulting in more informative uncertainty regions. Our approach is verified through experiments on image colorization, super-resolution, and inpainting; its effectiveness is shown through comparison to baseline methods, demonstrating significantly tighter uncertainty regions.

QUANT-PHDec 11, 2021
SPDCinv: Inverse Quantum-Optical Design of High-Dimensional Qudits

Eyal Rozenberg, Aviv Karnieli, Ofir Yesharim et al.

Spontaneous parametric down-conversion in quantum optics is an invaluable resource for the realization of high-dimensional qudits with spatial modes of light. One of the main open challenges is how to directly generate a desirable qudit state in the SPDC process. This problem can be addressed through advanced computational learning methods; however, due to difficulties in modeling the SPDC process by a fully differentiable algorithm that takes into account all interaction effects, progress has been limited. Here, we overcome these limitations and introduce a physically-constrained and differentiable model, validated against experimental results for shaped pump beams and structured crystals, capable of learning every interaction parameter in the process. We avoid any restrictions induced by the stochastic nature of our physical model and integrate the dynamic equations governing the evolution under the SPDC Hamiltonian. We solve the inverse problem of designing a nonlinear quantum optical system that achieves the desired quantum state of down-converted photon pairs. The desired states are defined using either the second-order correlations between different spatial modes or by specifying the required density matrix. By learning nonlinear volume holograms as well as different pump shapes, we successfully show how to generate maximally entangled states. Furthermore, we simulate all-optical coherent control over the generated quantum state by actively changing the profile of the pump beam. Our work can be useful for applications such as novel designs of high-dimensional quantum key distribution and quantum information processing protocols. In addition, our method can be readily applied for controlling other degrees of freedom of light in the SPDC process, such as the spectral and temporal properties, and may even be used in condensed-matter systems having a similar interaction Hamiltonian.

QUANT-PHFeb 20, 2021
Inverse Design of Quantum Holograms in Three-Dimensional Nonlinear Photonic Crystals

Eyal Rozenberg, Aviv Karnieli, Ofir Yesharim et al.

We introduce a systematic approach for designing 3D nonlinear photonic crystals and pump beams for generating desired quantum correlations between structured photon-pairs. Our model is fully differentiable, allowing accurate and efficient learning and discovery of novel designs.

IVSep 29, 2020
Learning an optimal PSF-pair for ultra-dense 3D localization microscopy

Elias Nehme, Boris Ferdman, Lucien E. Weiss et al.

A long-standing challenge in multiple-particle-tracking is the accurate and precise 3D localization of individual particles at close proximity. One established approach for snapshot 3D imaging is point-spread-function (PSF) engineering, in which the PSF is modified to encode the axial information. However, engineered PSFs are challenging to localize at high densities due to lateral PSF overlaps. Here we suggest using multiple PSFs simultaneously to help overcome this challenge, and investigate the problem of engineering multiple PSFs for dense 3D localization. We implement our approach using a bifurcated optical system that modifies two separate PSFs, and design the PSFs using three different approaches including end-to-end learning. We demonstrate our approach experimentally by volumetric imaging of fluorescently labelled telomeres in cells.

SPJun 27, 2020
SimGANs: Simulator-Based Generative Adversarial Networks for ECG Synthesis to Improve Deep ECG Classification

Tomer Golany, Daniel Freedman, Kira Radinsky

Generating training examples for supervised tasks is a long sought after goal in AI. We study the problem of heart signal electrocardiogram (ECG) synthesis for improved heartbeat classification. ECG synthesis is challenging: the generation of training examples for such biological-physiological systems is not straightforward, due to their dynamic nature in which the various parts of the system interact in complex ways. However, an understanding of these dynamics has been developed for years in the form of mathematical process simulators. We study how to incorporate this knowledge into the generative process by leveraging a biological simulator for the task of ECG classification. Specifically, we use a system of ordinary differential equations representing heart dynamics, and incorporate this ODE system into the optimization process of a generative adversarial network to create biologically plausible ECG training examples. We perform empirical evaluation and show that heart simulation knowledge during the generation process improves ECG classification.

CVJan 23, 2020
Detecting Deficient Coverage in Colonoscopies

Daniel Freedman, Yochai Blau, Liran Katzir et al.

Colonoscopy is the tool of choice for preventing Colorectal Cancer, by detecting and removing polyps before they become cancerous. However, colonoscopy is hampered by the fact that endoscopists routinely miss 22-28% of polyps. While some of these missed polyps appear in the endoscopist's field of view, others are missed simply because of substandard coverage of the procedure, i.e. not all of the colon is seen. This paper attempts to rectify the problem of substandard coverage in colonoscopy through the introduction of the C2D2 (Colonoscopy Coverage Deficiency via Depth) algorithm which detects deficient coverage, and can thereby alert the endoscopist to revisit a given area. More specifically, C2D2 consists of two separate algorithms: the first performs depth estimation of the colon given an ordinary RGB video stream; while the second computes coverage given these depth estimates. Rather than compute coverage for the entire colon, our algorithm computes coverage locally, on a segment-by-segment basis; C2D2 can then indicate in real-time whether a particular area of the colon has suffered from deficient coverage, and if so the endoscopist can return to that area. Our coverage algorithm is the first such algorithm to be evaluated in a large-scale way; while our depth estimation technique is the first calibration-free unsupervised method applied to colonoscopies. The C2D2 algorithm achieves state of the art results in the detection of deficient coverage. On synthetic sequences with ground truth, it is 2.4 times more accurate than human experts; while on real sequences, C2D2 achieves a 93.0% agreement with experts.

MED-PHOct 20, 2019
Detecting muscle activation using ultrasound speed of sound inversion with deep learning

Micha Feigin, Manuel Zwecker, Daniel Freedman et al.

Functional muscle imaging is essential for diagnostics of a multitude of musculoskeletal afflictions such as degenerative muscle diseases, muscle injuries, muscle atrophy, and neurological related issues such as spasticity. However, there is currently no solution, imaging or otherwise, capable of providing a map of active muscles over a large field of view in dynamic scenarios. In this work, we look at the feasibility of longitudinal sound speed measurements to the task of dynamic muscle imaging of contraction or activation. We perform the assessment using a deep learning network applied to pre-beamformed ultrasound channel data for sound speed inversion. Preliminary results show that dynamic muscle contraction can be detected in the calf and that this contraction can be positively assigned to the operating muscles. Potential frame rates in the hundreds to thousands of frames per second are necessary to accomplish this.

CVSep 19, 2019
Localization with Limited Annotation for Chest X-rays

Eyal Rozenberg, Daniel Freedman, Alex Bronstein

Localization of an object within an image is a common task in medical imaging. Learning to localize or detect objects typically requires the collection of data which has been labelled with bounding boxes or similar annotations, which can be very time consuming and expensive. A technique which could perform such learning with much less annotation would, therefore, be quite valuable. We present such a technique for localization with limited annotation, in which the number of images with bounding boxes can be a small fraction of the total dataset (e.g. less than 1%); all other images only possess a whole image label and no bounding box. We propose a novel loss function for tackling this problem; the loss is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning and is numerically well-posed. Furthermore, we propose a new architecture which accounts for both patch dependence and shift-invariance, through the inclusion of CRF layers and anti-aliasing filters, respectively. We apply our technique to the localization of thoracic diseases in chest X-ray images and demonstrate state-of-the-art localization performance on the ChestX-ray14 dataset.

CVDec 6, 2018
Unsupervised Single Image Dehazing Using Dark Channel Prior Loss

Alona Golts, Daniel Freedman, Michael Elad

Single image dehazing is a critical stage in many modern-day autonomous vision applications. Early prior-based methods often involved a time-consuming minimization of a hand-crafted energy function. Recent learning-based approaches utilize the representational power of deep neural networks (DNNs) to learn the underlying transformation between hazy and clear images. Due to inherent limitations in collecting matching clear and hazy images, these methods resort to training on synthetic data; constructed from indoor images and corresponding depth information. This may result in a possible domain shift when treating outdoor scenes. We propose a completely unsupervised method of training via minimization of the well-known, Dark Channel Prior (DCP) energy function. Instead of feeding the network with synthetic data, we solely use real-world outdoor images and tune the network's parameters by directly minimizing the DCP. Although our "Deep DCP" technique can be regarded as a fast approximator of DCP, it actually improves its results significantly. This suggests an additional regularization obtained via the network and learning process. Experiments show that our method performs on par with large-scale supervised methods.

LGSep 30, 2018
A Deep Learning Framework for Single-Sided Sound Speed Inversion in Medical Ultrasound

Micha Feigin, Daniel Freedman, Brian W. Anthony

Objective: Ultrasound elastography is gaining traction as an accessible and useful diagnostic tool for such things as cancer detection and differentiation and thyroid disease diagnostics. Unfortunately, state of the art shear wave imaging techniques, essential to promote this goal, are limited to high-end ultrasound hardware due to high power requirements; are extremely sensitive to patient and sonographer motion, and generally, suffer from low frame rates. Motivated by research and theory showing that longitudinal wave sound speed carries similar diagnostic abilities to shear wave imaging, we present an alternative approach using single sided pressure-wave sound speed measurements from channel data. Methods: In this paper, we present a single-sided sound speed inversion solution using a fully convolutional deep neural network. We use simulations for training, allowing the generation of limitless ground truth data. Results: We show that it is possible to invert for longitudinal sound speed in soft tissue at high frame rates. We validate the method on simulated data. We present highly encouraging results on limited real data. Conclusion: Sound speed inversion on channel data has significant potential, made possible in real time with deep learning technologies. Significance: Specialized shear wave ultrasound systems remain inaccessible in many locations. longitudinal sound speed and deep learning technologies enable an alternative approach to diagnosis based on tissue elasticity. High frame rates are possible.

LGMay 31, 2018
Deep-Energy: Unsupervised Training of Deep Neural Networks

Alona Golts, Daniel Freedman, Michael Elad

The success of deep learning has been due, in no small part, to the availability of large annotated datasets. Thus, a major bottleneck in current learning pipelines is the time-consuming human annotation of data. In scenarios where such input-output pairs cannot be collected, simulation is often used instead, leading to a domain-shift between synthesized and real-world data. This work offers an unsupervised alternative that relies on the availability of task-specific energy functions, replacing the generic supervised loss. Such energy functions are assumed to lead to the desired label as their minimizer given the input. The proposed approach, termed "Deep Energy", trains a Deep Neural Network (DNN) to approximate this minimization for any chosen input. Once trained, a simple and fast feed-forward computation provides the inferred label. This approach allows us to perform unsupervised training of DNNs with real-world inputs only, and without the need for manually-annotated labels, nor synthetically created data. "Deep Energy" is demonstrated in this paper on three different tasks -- seeded segmentation, image matting and single image dehazing -- exposing its generality and wide applicability. Our experiments show that the solution provided by the network is often much better in quality than the one obtained by a direct minimization of the energy function, suggesting an added regularization property in our scheme.

CVMay 24, 2018
SOSELETO: A Unified Approach to Transfer Learning and Training with Noisy Labels

Or Litany, Daniel Freedman

We present SOSELETO (SOurce SELEction for Target Optimization), a new method for exploiting a source dataset to solve a classification problem on a target dataset. SOSELETO is based on the following simple intuition: some source examples are more informative than others for the target problem. To capture this intuition, source samples are each given weights; these weights are solved for jointly with the source and target classification problems via a bilevel optimization scheme. The target therefore gets to choose the source samples which are most informative for its own classification task. Furthermore, the bilevel nature of the optimization acts as a kind of regularization on the target, mitigating overfitting. SOSELETO may be applied to both classic transfer learning, as well as the problem of training on datasets with noisy labels; we show state of the art results on both of these problems.

CVDec 4, 2015
ASIST: Automatic Semantically Invariant Scene Transformation

Or Litany, Tal Remez, Daniel Freedman et al.

We present ASIST, a technique for transforming point clouds by replacing objects with their semantically equivalent counterparts. Transformations of this kind have applications in virtual reality, repair of fused scans, and robotics. ASIST is based on a unified formulation of semantic labeling and object replacement; both result from minimizing a single objective. We present numerical tools for the efficient solution of this optimization problem. The method is experimentally assessed on new datasets of both synthetic and real point clouds, and is additionally compared to two recent works on object replacement on data from the corresponding papers.

CVMar 24, 2014
SRA: Fast Removal of General Multipath for ToF Sensors

Daniel Freedman, Eyal Krupka, Yoni Smolin et al.

A major issue with Time of Flight sensors is the presence of multipath interference. We present Sparse Reflections Analysis (SRA), an algorithm for removing this interference which has two main advantages. First, it allows for very general forms of multipath, including interference with three or more paths, diffuse multipath resulting from Lambertian surfaces, and combinations thereof. SRA removes this general multipath with robust techniques based on $L_1$ optimization. Second, due to a novel dimension reduction, we are able to produce a very fast version of SRA, which is able to run at frame rate. Experimental results on both synthetic data with ground truth, as well as real images of challenging scenes, validate the approach.