CVLGApr 15

Context Sensitivity Improves Human-Machine Visual Alignment

DeepMindStanford
arXiv:2604.1388354.9h-index: 29
AI Analysis

For researchers in human-machine alignment, this work improves visual odd-one-out task accuracy by incorporating context sensitivity, but the improvement is incremental.

The paper proposes a context-sensitive similarity computation method for neural network embeddings, achieving up to 15% improvement in odd-one-out accuracy over context-insensitive models, consistent across original and human-aligned vision models.

Modern machine learning models typically represent inputs as fixed points in a high-dimensional embedding space. While this approach has been proven powerful for a wide range of downstream tasks, it fundamentally differs from the way humans process information. Because humans are constantly adapting to their environment, they represent objects and their relationships in a highly context-sensitive manner. To address this gap, we propose a method for context-sensitive similarity computation from neural network embeddings, applied to modeling a triplet odd-one-out task with an anchor image serving as simultaneous context. Modeling context enables us to achieve up to a 15% improvement in odd-one-out accuracy over a context-insensitive model. We find that this improvement is consistent across both original and "human-aligned" vision foundation models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes