Kai Uwe Barthel

h-index15

8papers

66citations

Novelty44%

AI Score31

Ranked #129,252 of 194,257 authors (top 67%)#42,749 in CV (top 72%)

8 Papers

5.7CVMay 9, 2022Code

Improved Evaluation and Generation of Grid Layouts using Distance Preservation Quality and Linear Assignment Sorting

Kai Uwe Barthel, Nico Hezel, Klaus Jung et al.

Images sorted by similarity enables more images to be viewed simultaneously, and can be very useful for stock photo agencies or e-commerce applications. Visually sorted grid layouts attempt to arrange images so that their proximity on the grid corresponds as closely as possible to their similarity. Various metrics exist for evaluating such arrangements, but there is low experimental evidence on correlation between human perceived quality and metric value. We propose Distance Preservation Quality (DPQ) as a new metric to evaluate the quality of an arrangement. Extensive user testing revealed stronger correlation of DPQ with user-perceived quality and performance in image retrieval tasks compared to other metrics. In addition, we introduce Fast Linear Assignment Sorting (FLAS) as a new algorithm for creating visually sorted grid layouts. FLAS achieves very good sorting qualities while improving run time and computational resources.

11.3CVSep 3, 2024Code

Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment

Konstantin Schall, Kai Uwe Barthel, Nico Hezel et al.

Contrastive Language and Image Pairing (CLIP), a transformative method in multimedia retrieval, typically trains two neural networks concurrently to generate joint embeddings for text and image pairs. However, when applied directly, these models often struggle to differentiate between visually distinct images that have similar captions, resulting in suboptimal performance for image-based similarity searches. This paper addresses the challenge of optimizing CLIP models for various image-based similarity search scenarios, while maintaining their effectiveness in text-based search tasks such as text-to-image retrieval and zero-shot classification. We propose and evaluate two novel methods aimed at refining the retrieval capabilities of CLIP without compromising the alignment between text and image embeddings. The first method involves a sequential fine-tuning process: initially optimizing the image encoder for more precise image retrieval and subsequently realigning the text encoder to these optimized image embeddings. The second approach integrates pseudo-captions during the retrieval-optimization phase to foster direct alignment within the embedding space. Through comprehensive experiments, we demonstrate that these methods enhance CLIP's performance on various benchmarks, including image retrieval, k-NN classification, and zero-shot text-based classification, while maintaining robustness in text-to-image retrieval. Our optimized models permit maintaining a single embedding per image, significantly simplifying the infrastructure needed for large-scale multi-modal similarity search systems.

6.2CVMar 4, 2025

Creating Sorted Grid Layouts with Gradient-based Optimization

Kai Uwe Barthel, Florian Tim Barthel, Peter Eisert et al.

Visually sorted grid layouts provide an efficient method for organizing high-dimensional vectors in two-dimensional space by aligning spatial proximity with similarity relationships. This approach facilitates the effective sorting of diverse elements ranging from data points to images, and enables the simultaneous visualization of a significant number of elements. However, sorting data on two-dimensional grids is a challenge due to its high complexity. Even for a small 8-by-8 grid with 64 elements, the number of possible arrangements exceeds $1.3 \cdot 10^{89}$ - more than the number of atoms in the universe - making brute-force solutions impractical. Although various methods have been proposed to address the challenge of determining sorted grid layouts, none have investigated the potential of gradient-based optimization. In this paper, we present a novel method for grid-based sorting that exploits gradient optimization for the first time. We introduce a novel loss function that balances two opposing goals: ensuring the generation of a "valid" permutation matrix, and optimizing the arrangement on the grid to reflect the similarity between vectors, inspired by metrics that assess the quality of sorted grids. While learning-based approaches are inherently computationally complex, our method shows promising results in generating sorted grid layouts with superior sorting quality compared to existing techniques.

2.6CVNov 26, 2021

PicArrange -- Visually Sort, Search, and Explore Private Images on a Mac Computer

Klaus Jung, Kai Uwe Barthel, Nico Hezel et al.

The native macOS application PicArrange integrates state-of-the-art image sorting and similarity search to enable users to get a better overview of their images. Many file and image management features have been added to make it a tool that addresses a full image management workflow. A modification of the Self Sorting Map algorithm enables a list-like image arrangement without loosing the visual sorting. Efficient calculation and storage of visual features as well as the use of many macOS APIs result in an application that is fluid to use.

9.4CVNov 25, 2021Code

GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval

Konstantin Schall, Kai Uwe Barthel, Nico Hezel et al.

Even though it has extensively been shown that retrieval specific training of deep neural networks is beneficial for nearest neighbor image search quality, most of these models are trained and tested in the domain of landmarks images. However, some applications use images from various other domains and therefore need a network with good generalization properties - a general-purpose CBIR model. To the best of our knowledge, no testing protocol has so far been introduced to benchmark models with respect to general image retrieval quality. After analyzing popular image retrieval test sets we decided to manually curate GPR1200, an easy to use and accessible but challenging benchmark dataset with a broad range of image categories. This benchmark is subsequently used to evaluate various pretrained models of different architectures on their generalization qualities. We show that large-scale pretraining significantly improves retrieval performance and present experiments on how to further increase these properties by appropriate fine-tuning. With these promising results, we hope to increase interest in the research topic of general-purpose CBIR.

2.3MMOct 14, 2019

Real-Time Visual Navigation in Huge Image Sets Using Similarity Graphs

Kai Uwe Barthel, Nico Hezel, Konstantin Schall et al.

Nowadays stock photo agencies often have millions of images. Non-stop viewing of 20 million images at a speed of 10 images per second would take more than three weeks. This demonstrates the impossibility to inspect all images and the difficulty to get an overview of the entire collection. Although there has been a lot of effort to improve visual image search, there is little research and support for visual image exploration. Typically, users start "exploring" an image collection with a keyword search or an example image for a similarity search. Both searches lead to long unstructured lists of result images. In earlier publications, we introduced the idea of graph-based image navigation and proposed an efficient algorithm for building hierarchical image similarity graphs for dynamically changing image collections. In this demo we showcase real-time visual exploration of millions of images with a standard web browser. Subsets of images are successively retrieved from the graph and displayed as a visually sorted 2D image map, which can be zoomed and dragged to explore related concepts. Maintaining the positions of previously shown images creates the impression of an "endless map". This approach allows an easy visual image-based navigation, while preserving the complex image relationships of the graph.

1.8LGSep 20, 2019

Deep Metric Learning using Similarities from Nonlinear Rank Approximations

Konstantin Schall, Kai Uwe Barthel, Nico Hezel et al.

In recent years, deep metric learning has achieved promising results in learning high dimensional semantic feature embeddings where the spatial relationships of the feature vectors match the visual similarities of the images. Similarity search for images is performed by determining the vectors with the smallest distances to a query vector. However, high retrieval quality does not depend on the actual distances of the feature vectors, but rather on the ranking order of the feature vectors from similar images. In this paper, we introduce a metric learning algorithm that focuses on identifying and modifying those feature vectors that most strongly affect the retrieval quality. We compute normalized approximated ranks and convert them to similarities by applying a nonlinear transfer function. These similarities are used in a newly proposed loss function that better contracts similar and disperses dissimilar samples. Experiments demonstrate significant improvement over existing deep feature embedding methods on the CUB-200-2011, Cars196, and Stanford Online Products data sets for all embedding sizes.

0.9CVSep 20, 2019

Deep Aggregation of Regional Convolutional Activations for Content Based Image Retrieval

Konstantin Schall, Kai Uwe Barthel, Nico Hezel et al.

One of the key challenges of deep learning based image retrieval remains in aggregating convolutional activations into one highly representative feature vector. Ideally, this descriptor should encode semantic, spatial and low level information. Even though off-the-shelf pre-trained neural networks can already produce good representations in combination with aggregation methods, appropriate fine tuning for the task of image retrieval has shown to significantly boost retrieval performance. In this paper, we present a simple yet effective supervised aggregation method built on top of existing regional pooling approaches. In addition to the maximum activation of a given region, we calculate regional average activations of extracted feature maps. Subsequently, weights for each of the pooled feature vectors are learned to perform a weighted aggregation to a single feature vector. Furthermore, we apply our newly proposed NRA loss function for deep metric learning to fine tune the backbone neural network and to learn the aggregation weights. Our method achieves state-of-the-art results for the INRIA Holidays data set and competitive results for the Oxford Buildings and Paris data sets while reducing the training time significantly.