Lore Goetschalckx

CV
h-index15
7papers
430citations
Novelty45%
AI Score44

7 Papers

CVJun 11, 2023
Unlocking Feature Visualization for Deeper Networks with MAgnitude Constrained Optimization

Thomas Fel, Thibaut Boissin, Victor Boutin et al. · harvard

Feature visualization has gained substantial popularity, particularly after the influential work by Olah et al. in 2017, which established it as a crucial tool for explainability. However, its widespread adoption has been limited due to a reliance on tricks to generate interpretable images, and corresponding challenges in scaling it to deeper neural networks. Here, we describe MACO, a simple approach to address these shortcomings. The main idea is to generate images by optimizing the phase spectrum while keeping the magnitude constant to ensure that generated explanations lie in the space of natural images. Our approach yields significantly better results (both qualitatively and quantitatively) and unlocks efficient and interpretable feature visualizations for large state-of-the-art neural networks. We also show that our approach exhibits an attribution mechanism allowing us to augment feature visualizations with spatial importance. We validate our method on a novel benchmark for comparing feature visualization methods, and release its visualizations for all classes of the ImageNet dataset on https://serre-lab.github.io/Lens/. Overall, our approach unlocks, for the first time, feature visualizations for large, state-of-the-art deep neural networks without resorting to any parametric prior image model.

CVJun 20, 2023
Computing a human-like reaction time metric from stable recurrent vision models

Lore Goetschalckx, Lakshmi Narasimhan Govindarajan, Alekh Karkada Ashok et al.

The meteoric rise in the adoption of deep neural networks as computational models of vision has inspired efforts to "align" these models with humans. One dimension of interest for alignment includes behavioral choices, but moving beyond characterizing choice patterns to capturing temporal aspects of visual decision-making has been challenging. Here, we sketch a general-purpose methodology to construct computational accounts of reaction times from a stimulus-computable, task-optimized model. Specifically, we introduce a novel metric leveraging insights from subjective logic theory summarizing evidence accumulation in recurrent vision models. We demonstrate that our metric aligns with patterns of human reaction times for stimulus manipulations across four disparate visual decision-making tasks spanning perceptual grouping, mental simulation, and scene categorization. This work paves the way for exploring the temporal alignment of model and human visual strategies in the context of various other cognitive tasks toward generating testable hypotheses for neuroscience. Links to the code and data can be found on the project page: https://serre-lab.github.io/rnn_rts_site.

CVMay 19
Capability $\neq$ Interpretability: Human Interpretability of Vision Foundation Models

Julien Colin, Lore Goetschalckx, Nuria Oliver et al.

How interpretable are the features of leading vision models? The question is increasingly pressing as these models move from research benchmarks into high-stakes deployments, yet existing methods cannot answer it reliably. We close this gap with a framework for measuring and comparing the human interpretability of vision models, built around two complementary psychophysics protocols: (1) localizability -- can an observer predict where a feature fires on a novel image? -- and (2) nameability -- can an observer accurately describe what the feature represents? Features are recovered via sparse autoencoders, and a chance-anchored scoring function places every model on a common scale. Applying the framework to six vision transformers -- two supervised ViTs and four foundation models (DINOv2, DINOv3, CLIP, SigLIP) -- we collected more than $15{,}000$ behavioral responses, analyzing the $13{,}400$ responses from the $377$ participants who passed our pre-specified quality checks. Foundation models are consistently *less* interpretable than their supervised counterparts, and the gap is not a capability tradeoff: interpretability does not correlate with downstream task performance on any benchmark we examine. What does correlate is the locality of a feature's activations and coarse-grained semantic alignment with humans -- models with focal activations and representations that reflect the world's broad categorical structure produce more interpretable features, whereas fine-grained perceptual alignment does not. The two protocols yield strongly correlated rankings and share the same predictors, establishing interpretability as an independent, measurable dimension of representation quality -- and, surprisingly, one on which every foundation model we tested falls below the supervised baselines that came before. Capability alone cannot close that gap; locality and coarse-grained alignment can.

CVNov 6, 2024
Local vs distributed representations: What is the right basis for interpretability?

Julien Colin, Lore Goetschalckx, Thomas Fel et al. · harvard

Much of the research on the interpretability of deep neural networks has focused on studying the visual features that maximally activate individual neurons. However, recent work has cast doubts on the usefulness of such local representations for understanding the behavior of deep neural networks because individual neurons tend to respond to multiple unrelated visual patterns, a phenomenon referred to as "superposition". A promising alternative to disentangle these complex patterns is learning sparsely distributed vector representations from entire network layers, as the resulting basis vectors seemingly encode single identifiable visual patterns consistently. Thus, one would expect the resulting code to align better with human perceivable visual patterns, but supporting evidence remains, at best, anecdotal. To fill this gap, we conducted three large-scale psychophysics experiments collected from a pool of 560 participants. Our findings provide (i) strong evidence that features obtained from sparse distributed representations are easier to interpret by human observers and (ii) that this effect is more pronounced in the deepest layers of a neural network. Complementary analyses also reveal that (iii) features derived from sparse distributed representations contribute more to the model's decision. Overall, our results highlight that distributed representations constitute a superior basis for interpretability, underscoring a need for the field to move beyond the interpretation of local neural codes in favor of sparsely distributed ones.

CVSep 27, 2025
Increasing the Diversity in RGB-to-Thermal Image Translation for Automotive Applications

Kaili Wang, Leonardo Ravaglia, Roberto Longo et al.

Thermal imaging in Advanced Driver Assistance Systems (ADAS) improves road safety with superior perception in low-light and harsh weather conditions compared to traditional RGB cameras. However, research in this area faces challenges due to limited dataset availability and poor representation in driving simulators. RGB-to-thermal image translation offers a potential solution, but existing methods focus on one-to-one mappings. We propose a one-to-many mapping using a multi-modal translation framework enhanced with our Component-aware Adaptive Instance Normalization (CoAdaIN). Unlike the original AdaIN, which applies styles globally, CoAdaIN adapts styles to different image components individually. The result, as we show, is more realistic and diverse thermal image translations. This is the accepted author manuscript of the paper published in IEEE Sensors Conference 2024. The final published version is available at 10.1109/SENSORS60989.2024.10785056.

CVApr 1, 2021
Memorability: An image-computable measure of information utility

Zoya Bylinskii, Lore Goetschalckx, Anelise Newman et al.

The pixels in an image, and the objects, scenes, and actions that they compose, determine whether an image will be memorable or forgettable. While memorability varies by image, it is largely independent of an individual observer. Observer independence is what makes memorability an image-computable measure of information, and eligible for automatic prediction. In this chapter, we zoom into memorability with a computational lens, detailing the state-of-the-art algorithms that accurately predict image memorability relative to human behavioral data, using image features at different scales from raw pixels to semantic labels. We discuss the design of algorithms and visualizations for face, object, and scene memorability, as well as algorithms that generalize beyond static scenes to actions and videos. We cover the state-of-the-art deep learning approaches that are the current front runners in the memorability prediction space. Beyond prediction, we show how recent A.I. approaches can be used to create and modify visual memorability. Finally, we preview the computational applications that memorability can power, from filtering visual streams to enhancing augmented reality interfaces.

CVJun 24, 2019
GANalyze: Toward Visual Definitions of Cognitive Image Properties

Lore Goetschalckx, Alex Andonian, Aude Oliva et al.

We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence. These attributes are of interest because we do not have a concrete visual definition of what they entail. What does it look like for a dog to be more or less memorable? GANs allow us to generate a manifold of natural-looking images with fine-grained differences in their visual attributes. By navigating this manifold in directions that increase memorability, we can visualize what it looks like for a particular generated image to become more or less memorable. The resulting ``visual definitions" surface image properties (like ``object size") that may underlie memorability. Through behavioral experiments, we verify that our method indeed discovers image manipulations that causally affect human memory performance. We further demonstrate that the same framework can be used to analyze image aesthetics and emotional valence. Visit the GANalyze website at http://ganalyze.csail.mit.edu/.