Finer-Personalization Rank: Fine-Grained Retrieval Examines Identity Preservation for Personalized Generation
This addresses the need for better evaluation metrics in personalized generation, particularly for applications requiring precise identity retention, though it is incremental as it builds on existing retrieval and evaluation concepts.
The paper tackles the problem of evaluating identity preservation in personalized generative models by introducing Finer-Personalization Rank, a retrieval-based protocol that measures fine-grained details like distinctive features, and it shows that this method more faithfully reflects identity retention than semantic-only metrics across benchmarks such as CUB, Stanford Cars, and animal Re-ID.
The rise of personalized generative models raises a central question: how should we evaluate identity preservation? Given a reference image (e.g., one's pet), we expect the generated image to retain precise details attached to the subject's identity. However, current generative evaluation metrics emphasize the overall semantic similarity between the reference and the output, and overlook these fine-grained discriminative details. We introduce Finer-Personalization Rank, an evaluation protocol tailored to identity preservation. Instead of pairwise similarity, Finer-Personalization Rank adopts a ranking view: it treats each generated image as a query against an identity-labeled gallery consisting of visually similar real images. Retrieval metrics (e.g., mean average precision) measure performance, where higher scores indicate that identity-specific details (e.g., a distinctive head spot) are preserved. We assess identity at multiple granularities -- from fine-grained categories (e.g., bird species, car models) to individual instances (e.g., re-identification). Across CUB, Stanford Cars, and animal Re-ID benchmarks, Finer-Personalization Rank more faithfully reflects identity retention than semantic-only metrics and reveals substantial identity drift in several popular personalization methods. These results position the gallery-based protocol as a principled and practical evaluation for personalized generation.