Guy Heller

CV
h-index4
3papers
10citations
Novelty48%
AI Score29

3 Papers

CVSep 26, 2023
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features

Hila Levi, Guy Heller, Dan Levi et al.

The task of open-vocabulary object-centric image retrieval involves the retrieval of images containing a specified object of interest, delineated by an open-set text query. As working on large image datasets becomes standard, solving this task efficiently has gained significant practical importance. Applications include targeted performance analysis of retrieved images using ad-hoc queries and hard example mining during training. Recent advancements in contrastive-based open vocabulary systems have yielded remarkable breakthroughs, facilitating large-scale open vocabulary image retrieval. However, these approaches use a single global embedding per image, thereby constraining the system's ability to retrieve images containing relatively small object instances. Alternatively, incorporating local embeddings from detection pipelines faces scalability challenges, making it unsuitable for retrieval from large databases. In this work, we present a simple yet effective approach to object-centric open-vocabulary image retrieval. Our approach aggregates dense embeddings extracted from CLIP into a compact representation, essentially combining the scalability of image retrieval pipelines with the object identification capabilities of dense detection methods. We show the effectiveness of our scheme to the task by achieving significantly better results than global feature approaches on three datasets, increasing accuracy by up to 15 mAP points. We further integrate our scheme into a large scale retrieval framework and demonstrate our method's advantages in terms of scalability and interpretability.

CVDec 25, 2024
FOR: Finetuning for Object Level Open Vocabulary Image Retrieval

Hila Levi, Guy Heller, Dan Levi

As working with large datasets becomes standard, the task of accurately retrieving images containing objects of interest by an open set textual query gains practical importance. The current leading approach utilizes a pre-trained CLIP model without any adaptation to the target domain, balancing accuracy and efficiency through additional post-processing. In this work, we propose FOR: Finetuning for Object-centric Open-vocabulary Image Retrieval, which allows finetuning on a target dataset using closed-set labels while keeping the visual-language association crucial for open vocabulary retrieval. FOR is based on two design elements: a specialized decoder variant of the CLIP head customized for the intended task, and its coupling within a multi-objective training framework. Together, these design choices result in a significant increase in accuracy, showcasing improvements of up to 8 mAP@50 points over SoTA across three datasets. Additionally, we demonstrate that FOR is also effective in a semi-supervised setting, achieving impressive results even when only a small portion of the dataset is labeled.

LGOct 11, 2021
Can Stochastic Gradient Langevin Dynamics Provide Differential Privacy for Deep Learning?

Guy Heller, Ethan Fetaya

Bayesian learning via Stochastic Gradient Langevin Dynamics (SGLD) has been suggested for differentially private learning. While previous research provides differential privacy bounds for SGLD at the initial steps of the algorithm or when close to convergence, the question of what differential privacy guarantees can be made in between remains unanswered. This interim region is of great importance, especially for Bayesian neural networks, as it is hard to guarantee convergence to the posterior. This paper shows that using SGLD might result in unbounded privacy loss for this interim region, even when sampling from the posterior is as differentially private as desired.