NC LGSep 26, 2024

A Framework for Standardizing Similarity Measures in a Rapidly Evolving Field

Nathan Cloos, Guangyu Robert Yang, Christopher J. Cueva

arXiv:2409.18333v23.32 citationsh-index: 2Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of comparing similarity measures across studies for researchers in AI and biology, but it is incremental as it builds on existing tools without introducing new methods.

The authors tackled the problem of diverse and inconsistently named similarity measures in AI and biology by creating a Python repository that benchmarks and standardizes about 100 measures from 14 packages, providing a framework for developing naming conventions to facilitate cross-study comparisons.

Similarity measures are fundamental tools for quantifying the alignment between artificial and biological systems. However, the diversity of similarity measures and their varied naming and implementation conventions makes it challenging to compare across studies. To facilitate comparisons and make explicit the implementation choices underlying a given code package, we have created and are continuing to develop a Python repository that benchmarks and standardizes similarity measures. The goal of creating a consistent naming convention that uniquely and efficiently specifies a similarity measure is not trivial as, for example, even commonly used methods like Centered Kernel Alignment (CKA) have at least 12 different variations, and this number will likely continue to grow as the field evolves. For this reason, we do not advocate for a fixed, definitive naming convention. The landscape of similarity measures and best practices will continue to change and so we see our current repository, which incorporates approximately 100 different similarity measures from 14 packages, as providing a useful tool at this snapshot in time. To accommodate the evolution of the field we present a framework for developing, validating, and refining naming conventions with the goal of uniquely and efficiently specifying similarity measures, ultimately making it easier for the community to make comparisons across studies.

View on arXiv PDF Code

Similar