Visual DNA: Representing and Comparing Images using Distributions of Neuron Activations
This provides a general-purpose tool for dataset comparison in computer vision, addressing a critical need for dataset selection, though it is incremental as it builds on pre-trained feature extractors.
The paper tackles the problem of evaluating differences between datasets in computer vision by proposing Distributions of Neuron Activations (DNAs), which represent images and datasets compactly (less than 15 MB) and allow customizable distance measurements, demonstrating applicability across tasks like synthetic image evaluation and transfer learning.
Selecting appropriate datasets is critical in modern computer vision. However, no general-purpose tools exist to evaluate the extent to which two datasets differ. For this, we propose representing images - and by extension datasets - using Distributions of Neuron Activations (DNAs). DNAs fit distributions, such as histograms or Gaussians, to activations of neurons in a pre-trained feature extractor through which we pass the image(s) to represent. This extractor is frozen for all datasets, and we rely on its generally expressive power in feature space. By comparing two DNAs, we can evaluate the extent to which two datasets differ with granular control over the comparison attributes of interest, providing the ability to customise the way distances are measured to suit the requirements of the task at hand. Furthermore, DNAs are compact, representing datasets of any size with less than 15 megabytes. We demonstrate the value of DNAs by evaluating their applicability on several tasks, including conditional dataset comparison, synthetic image evaluation, and transfer learning, and across diverse datasets, ranging from synthetic cat images to celebrity faces and urban driving scenes.