Odelia Schwartz

h-index22

6papers

49citations

Novelty45%

AI Score42

Ranked #63,457 of 194,257 authors (top 33%)#21,797 in CV (top 37%)

6 Papers

2.0OHApr 15

Use and usability: concepts of representation in philosophy, neuroscience, cognitive science, and computer science

Ben Baker, Richard D. Lange, Andrew Richmond et al.

Representations play a central role in the study of both biological and artificial intelligence, as well as philosophy of mind. Across neuroscience, computer science, and philosophy, a recurring theme is that representations not only carry information but should be ``useful'' for or ``usable'' by an agent in some sense. Here, we review how the ``usefulness'' of representations has been conceptualized and how it figures into different conceptions of representation. We identify and explore four aspects of use and usability: representations generally carry \textit{information}; that information may or may not be \textit{useful} and it may or may not be encoded in a usable \textit{format}; and the representations may or may not be \textit{used downstream}. Building on these four aspects of information and use, we then organize existing perspectives on neural representations into three levels: Representations as Information (Level 1); Representations as Usable (Level 2); and Representations as Used (Level 3). Our account is meant to give readers an appreciation for the diversity of notions of ``neural representation,'' help them navigate the vast and multi-disciplinary literature on the topic, and help them clarify the appropriate notion of representation for their own investigations.

12.1CVApr 4, 2024Code

Dissecting Query-Key Interaction in Vision Transformers

Xu Pan, Aaron Philip, Ziqian Xie et al.

Self-attention in vision transformers is often thought to perform perceptual grouping where tokens attend to other tokens with similar embeddings, which could correspond to semantically similar features of an object. However, attending to dissimilar tokens can be beneficial by providing contextual information. We propose to analyze the query-key interaction by the singular value decomposition of the interaction matrix (i.e. ${\textbf{W}_q}^\top\textbf{W}_k$). We find that in many ViTs, especially those with classification training objectives, early layers attend more to similar tokens, while late layers show increased attention to dissimilar tokens, providing evidence corresponding to perceptual grouping and contextualization, respectively. Many of these interactions between features represented by singular vectors are interpretable and semantic, such as attention between relevant objects, between parts of an object, or between the foreground and background. This offers a novel perspective on interpreting the attention mechanism, which contributes to understanding how transformer models utilize context and salient features when processing images.

3.6CVMay 20, 2025

Enhancing Vision Transformer Explainability Using Artificial Astrocytes

Nicolas Echevarrieta-Catalan, Ana Ribas-Rodriguez, Francisco Cedron et al.

Machine learning models achieve high precision, but their decision-making processes often lack explainability. Furthermore, as model complexity increases, explainability typically decreases. Existing efforts to improve explainability primarily involve developing new eXplainable artificial intelligence (XAI) techniques or incorporating explainability constraints during training. While these approaches yield specific improvements, their applicability remains limited. In this work, we propose the Vision Transformer with artificial Astrocytes (ViTA). This training-free approach is inspired by neuroscience and enhances the reasoning of a pretrained deep neural network to generate more human-aligned explanations. We evaluated our approach employing two well-known XAI techniques, Grad-CAM and Grad-CAM++, and compared it to a standard Vision Transformer (ViT). Using the ClickMe dataset, we quantified the similarity between the heatmaps produced by the XAI techniques and a (human-aligned) ground truth. Our results consistently demonstrate that incorporating artificial astrocytes enhances the alignment of model explanations with human perception, leading to statistically significant improvements across all XAI techniques and metrics utilized.

2.6CVAug 3, 2021

Inference via Sparse Coding in a Hierarchical Vision Model

Joshua Bowren, Luis Sanchez-Giraldo, Odelia Schwartz

Sparse coding has been incorporated in models of the visual cortex for its computational advantages and connection to biology. But how the level of sparsity contributes to performance on visual tasks is not well understood. In this work, sparse coding has been integrated into an existing hierarchical V2 model (Hosoya and Hyvärinen, 2015), but replacing its independent component analysis (ICA) with an explicit sparse coding in which the degree of sparsity can be controlled. After training, the sparse coding basis functions with a higher degree of sparsity resembled qualitatively different structures, such as curves and corners. The contributions of the models were assessed with image classification tasks, specifically tasks associated with mid-level vision including figure-ground classification, texture classification, and angle prediction between two line stimuli. In addition, the models were assessed in comparison to a texture sensitivity measure that has been reported in V2 (Freeman et al., 2013), and a deleted-region inference task. The results from the experiments show that while sparse coding performed worse than ICA at classifying images, only sparse coding was able to better match the texture sensitivity level of V2 and infer deleted image regions, both by increasing the degree of sparsity in sparse coding. Higher degrees of sparsity allowed for inference over larger deleted image regions. The mechanism that allows for this inference capability in sparse coding is described here.

6.6NCJun 7, 2018

Correspondence of Deep Neural Networks and the Brain for Visual Textures

Md Nasir Uddin Laskar, Luis G Sanchez Giraldo, Odelia Schwartz

Deep convolutional neural networks (CNNs) trained on objects and scenes have shown intriguing ability to predict some response properties of visual cortical neurons. However, the factors and computations that give rise to such ability, and the role of intermediate processing stages in explaining changes that develop across areas of the cortical hierarchy, are poorly understood. We focused on the sensitivity to textures as a paradigmatic example, since recent neurophysiology experiments provide rich data pointing to texture sensitivity in secondary but not primary visual cortex. We developed a quantitative approach for selecting a subset of the neural unit population from the CNN that best describes the brain neural recordings. We found that the first two layers of the CNN showed qualitative and quantitative correspondence to the cortical data across a number of metrics. This compatibility was reduced for the architecture alone rather than the learned weights, for some other related hierarchical models, and only mildly in the absence of a nonlinear computation akin to local divisive normalization. Our results show that the CNN class of model is effective for capturing changes that develop across early areas of cortex, and has the potential to facilitate understanding of the computations that give rise to hierarchical processing in the brain.

2.3NCJun 5, 2018

Integrating Flexible Normalization into Mid-Level Representations of Deep Convolutional Neural Networks

Luis Gonzalo Sanchez Giraldo, Odelia Schwartz

Deep convolutional neural networks (CNNs) are becoming increasingly popular models to predict neural responses in visual cortex. However, contextual effects, which are prevalent in neural processing and in perception, are not explicitly handled by current CNNs, including those used for neural prediction. In primary visual cortex, neural responses are modulated by stimuli spatially surrounding the classical receptive field in rich ways. These effects have been modeled with divisive normalization approaches, including flexible models, where spatial normalization is recruited only to the degree responses from center and surround locations are deemed statistically dependent. We propose a flexible normalization model applied to mid-level representations of deep CNNs as a tractable way to study contextual normalization mechanisms in mid-level cortical areas. This approach captures non-trivial spatial dependencies among mid-level features in CNNs, such as those present in textures and other visual stimuli, that arise from tiling high order features, geometrically. We expect that the proposed approach can make predictions about when spatial normalization might be recruited in mid-level cortical areas. We also expect this approach to be useful as part of the CNN toolkit, therefore going beyond more restrictive fixed forms of normalization.