QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval
This work improves retrieval relevance for users in image search applications, though it is incremental as it builds on existing CIR methods.
The paper tackles the problem of composed image retrieval (CIR) by addressing false negatives in contrastive learning, which can lead to irrelevant image retrieval and reduced user satisfaction. The proposed QuRe method achieves state-of-the-art performance on FashionIQ and CIRR datasets and shows strong alignment with human preferences on a new HP-FashionIQ dataset.
Composed Image Retrieval (CIR) retrieves relevant images based on a reference image and accompanying text describing desired modifications. However, existing CIR methods only focus on retrieving the target image and disregard the relevance of other images. This limitation arises because most methods employing contrastive learning-which treats the target image as positive and all other images in the batch as negatives-can inadvertently include false negatives. This may result in retrieving irrelevant images, reducing user satisfaction even when the target image is retrieved. To address this issue, we propose Query-Relevant Retrieval through Hard Negative Sampling (QuRe), which optimizes a reward model objective to reduce false negatives. Additionally, we introduce a hard negative sampling strategy that selects images positioned between two steep drops in relevance scores following the target image, to effectively filter false negatives. In order to evaluate CIR models on their alignment with human satisfaction, we create Human-Preference FashionIQ (HP-FashionIQ), a new dataset that explicitly captures user preferences beyond target retrieval. Extensive experiments demonstrate that QuRe achieves state-of-the-art performance on FashionIQ and CIRR datasets while exhibiting the strongest alignment with human preferences on the HP-FashionIQ dataset. The source code is available at https://github.com/jackwaky/QuRe.