CVDec 6, 2022
ADIR: Adaptive Diffusion for Image ReconstructionShady Abu-Hussein, Tom Tirer, Raja Giryes
Denoising diffusion models have recently achieved remarkable success in image generation, capturing rich information about natural image statistics. This makes them highly promising for image reconstruction, where the goal is to recover a clean image from a degraded observation. In this work, we introduce a conditional sampling framework that leverages the powerful priors learned by diffusion models while enforcing consistency with the available measurements. To further adapt pre-trained diffusion models to the specific degradation at hand, we propose a novel fine-tuning strategy. In particular, we employ LoRA-based adaptation using images that are semantically and visually similar to the degraded input, efficiently retrieved from a large and diverse dataset via an off-the-shelf vision-language model. We evaluate our approach on two leading publicly available diffusion models--Stable Diffusion and Guided Diffusion--and demonstrate that our method, termed Adaptive Diffusion for Image Reconstruction (ADIR), yields substantial improvements across a range of image reconstruction tasks.
CVApr 25, 2022
ProCST: Boosting Semantic Segmentation Using Progressive Cyclic Style-TransferShahaf Ettedgui, Shady Abu-Hussein, Raja Giryes
Using synthetic data for training neural networks that achieve good performance on real-world data is an important task as it can reduce the need for costly data annotation. Yet, synthetic and real world data have a domain gap. Reducing this gap, also known as domain adaptation, has been widely studied in recent years. Closing the domain gap between the source (synthetic) and target (real) data by directly performing the adaptation between the two is challenging. In this work, we propose a novel two-stage framework for improving domain adaptation techniques on image data. In the first stage, we progressively train a multi-scale neural network to perform image translation from the source domain to the target domain. We denote the new transformed data as "Source in Target" (SiT). Then, we insert the generated SiT data as the input to any standard UDA approach. This new data has a reduced domain gap from the desired target domain, which facilitates the applied UDA approach to close the gap further. We emphasize the effectiveness of our method via a comparison to other leading UDA and image-to-image translation techniques when used as SiT generators. Moreover, we demonstrate the improvement of our framework with three state-of-the-art UDA methods for semantic segmentation, HRDA, DAFormer and ProDA, on two UDA tasks, GTA5 to Cityscapes and Synthia to Cityscapes.
CVMar 26, 2023
VisDA 2022 Challenge: Domain Adaptation for Industrial Waste SortingDina Bashkirova, Samarth Mishra, Diala Lteif et al.
Label-efficient and reliable semantic segmentation is essential for many real-life applications, especially for industrial settings with high visual diversity, such as waste sorting. In industrial waste sorting, one of the biggest challenges is the extreme diversity of the input stream depending on factors like the location of the sorting facility, the equipment available in the facility, and the time of year, all of which significantly impact the composition and visual appearance of the waste stream. These changes in the data are called ``visual domains'', and label-efficient adaptation of models to such domains is needed for successful semantic segmentation of industrial waste. To test the abilities of computer vision models on this task, we present the VisDA 2022 Challenge on Domain Adaptation for Industrial Waste Sorting. Our challenge incorporates a fully-annotated waste sorting dataset, ZeroWaste, collected from two real material recovery facilities in different locations and seasons, as well as a novel procedurally generated synthetic waste sorting dataset, SynthWaste. In this competition, we aim to answer two questions: 1) can we leverage domain adaptation techniques to minimize the domain gap? and 2) can synthetic data augmentation improve performance on this task and help adapt to changing data distributions? The results of the competition show that industrial waste detection poses a real domain adaptation problem, that domain generalization techniques such as augmentations, ensembling, etc., improve the overall performance on the unlabeled target domain examples, and that leveraging synthetic data effectively remains an open problem. See https://ai.bu.edu/visda-2022/
45.7CVMar 23Code
Anatomical Token Uncertainty for Transformer-Guided Active MRI AcquisitionLev Ayzenberg, Shady Abu-Hussein, Raja Giryes et al.
Full data acquisition in MRI is inherently slow, which limits clinical throughput and increases patient discomfort. Compressed Sensing MRI (CS-MRI) seeks to accelerate acquisition by reconstructing images from under-sampled k-space data, requiring both an optimal sampling trajectory and a high-fidelity reconstruction model. In this work, we propose a novel active sampling framework that leverages the inherent discrete structure of a pretrained medical image tokenizer and a latent transformer. By representing anatomy through a dictionary of quantized visual tokens, the model provides a well-defined probability distribution over the latent space. We utilize this distribution to derive a principled uncertainty measure via token entropy, which guides the active sampling process. We introduce two strategies to exploit this latent uncertainty: (1) Latent Entropy Selection (LES), projecting patch-wise token entropy into the $k$-space domain to identify informative sampling lines, and (2) Gradient-based Entropy Optimization (GEO), which identifies regions of maximum uncertainty reduction via the $k$-space gradient of a total latent entropy loss. We evaluate our framework on the fastMRI singlecoil Knee and Brain datasets at $\times 8$ and $\times 16$ acceleration. Our results demonstrate that our active policies outperform state-of-the-art baselines in perceptual metrics, and feature-based distances. Our code is available at https://github.com/levayz/TRUST-MRI.
67.1CVMay 7
Making Reconstruction FID Predictive of Diffusion Generation FIDTongda Xu, Mingwei He, Shady Abu-Hussein et al.
It is well known that the reconstruction FID (rFID) of a VAE is poorly correlated with the generation FID (gFID) of a latent diffusion model. We propose interpolated FID (iFID), a simple variant of rFID that exhibits a strong correlation with gFID. Specifically, for each dataset element, we retrieve its nearest neighbor in latent space, interpolate between their latent representations, decode the interpolated latent, and compute the FID between the decoded samples and the original dataset. We provide an intuitive explanation for why iFID correlates well with gFID, and why reconstruction metrics can be negatively correlated with gFID, by connecting iFID to recent results on diffusion generalization and hallucination. Theoretically, we show that iFID evaluates decoded interpolations aligned with the ridge set around which diffusion samples concentrate, thereby measuring a quantity closely related to diffusion sample quality. Empirically, iFID is the first metric shown to strongly correlate with diffusion gFID across diverse VAEs, achieving Pearson and Spearman correlations of approximately $0.85$. The project page is available at https://tongdaxu.github.io/pages/ifid.html.
CVMay 25, 2023Code
UDPM: Upsampling Diffusion Probabilistic ModelsShady Abu-Hussein, Raja Giryes
Denoising Diffusion Probabilistic Models (DDPM) have recently gained significant attention. DDPMs compose a Markovian process that begins in the data domain and gradually adds noise until reaching pure white noise. DDPMs generate high-quality samples from complex data distributions by defining an inverse process and training a deep neural network to learn this mapping. However, these models are inefficient because they require many diffusion steps to produce aesthetically pleasing samples. Additionally, unlike generative adversarial networks (GANs), the latent space of diffusion models is less interpretable. In this work, we propose to generalize the denoising diffusion process into an Upsampling Diffusion Probabilistic Model (UDPM). In the forward process, we reduce the latent variable dimension through downsampling, followed by the traditional noise perturbation. As a result, the reverse process gradually denoises and upsamples the latent variable to produce a sample from the data distribution. We formalize the Markovian diffusion processes of UDPM and demonstrate its generation capabilities on the popular FFHQ, AFHQv2, and CIFAR10 datasets. UDPM generates images with as few as three network evaluations, whose overall computational cost is less than a single DDPM or EDM step, while achieving an FID score of 6.86. This surpasses current state-of-the-art efficient diffusion models that use a single denoising step for sampling. Additionally, UDPM offers an interpretable and interpolable latent space, which gives it an advantage over traditional DDPMs. Our code is available online: \url{https://github.com/shadyabh/UDPM/}
CVNov 23, 2025
Robust Posterior Diffusion-based Sampling via Adaptive Guidance ScaleLiav Hen, Tom Tirer, Raja Giryes et al.
Diffusion models have recently emerged as powerful generative priors for solving inverse problems, achieving state-of-the-art results across various imaging tasks. A central challenge in this setting lies in balancing the contribution of the prior with the data fidelity term: overly aggressive likelihood updates may introduce artifacts, while conservative updates can slow convergence or yield suboptimal reconstructions. In this work, we propose an adaptive likelihood step-size strategy to guide the diffusion process for inverse-problem formulations. Specifically, we develop an observation-dependent weighting scheme based on the agreement between two different approximations of the intractable intermediate likelihood gradients, that adapts naturally to the diffusion schedule, time re-spacing, and injected stochasticity. The resulting approach, Adaptive Posterior diffusion Sampling (AdaPS), is hyperparameter-free and improves reconstruction quality across diverse imaging tasks - including super-resolution, Gaussian deblurring, and motion deblurring - on CelebA-HQ and ImageNet-256 validation sets. AdaPS consistently surpasses existing diffusion-based baselines in perceptual quality with minimal or no loss in distortion, without any task-specific tuning. Extensive ablation studies further demonstrate its robustness to the number of diffusion steps, observation noise levels, and varying stochasticity.
LGJun 20, 2024
DeciMamba: Exploring the Length Extrapolation Potential of MambaAssaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein et al.
Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be relatively limited. Through a series of visualizations and analyses we identify that the limitations arise from a restricted effective receptive field, dictated by the sequence length used during training. To address this constraint, we introduce DeciMamba, a context-extension method specifically designed for Mamba. This mechanism, built on top of a hidden filtering mechanism embedded within the S6 layer, enables the trained model to extrapolate well even without additional training. Empirical experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths that are significantly longer than the ones seen during training, while enjoying faster inference.
CVDec 4, 2021
Unsupervised Domain Generalization by Learning a Bridge Across DomainsSivan Harary, Eli Schwartz, Assaf Arbelle et al.
The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system. In this paper, different from most cross-domain works that utilize some (or full) source domain supervision, we approach a relatively new and very practical Unsupervised Domain Generalization (UDG) setup of having no training supervision in neither source nor target domains. Our approach is based on self-supervised learning of a Bridge Across Domains (BrAD) - an auxiliary bridge domain accompanied by a set of semantics preserving visual (image-to-image) mappings to BrAD from each of the training domains. The BrAD and mappings to it are learned jointly (end-to-end) with a contrastive self-supervised representation model that semantically aligns each of the domains to its BrAD-projection, and hence implicitly drives all the domains (seen or unseen) to semantically align to each other. In this work, we show how using an edge-regularized BrAD our approach achieves significant gains across multiple benchmarks and a range of tasks, including UDG, Few-shot UDA, and unsupervised generalization across multi-domain datasets (including generalization to unseen domains and classes).
CVFeb 4, 2021
Image Restoration by Deep Projected GSUREShady Abu-Hussein, Tom Tirer, Se Young Chun et al.
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution. In recent years, solutions that are based on deep Convolutional Neural Networks (CNNs) have shown great promise. Yet, most of these techniques, which train CNNs using external data, are restricted to the observation models that have been used in the training phase. A recent alternative that does not have this drawback relies on learning the target image using internal learning. One such prominent example is the Deep Image Prior (DIP) technique that trains a network directly on the input image with a least-squares loss. In this paper, we propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN. We demonstrate two ways to use our framework. In the first one, where no explicit prior is used, we show that the proposed approach outperforms other internal learning methods, such as DIP. In the second one, we show that our GSURE-based loss leads to improved performance when used within a plug-and-play priors scheme.