Spiros Baxevanakis

18.5CVMar 26

Do All Vision Transformers Need Registers? A Cross-Architectural Reassessment

Spiros Baxevanakis, Platon Karageorgis, Ioannis Dravilas et al.

Training Vision Transformers (ViTs) presents significant challenges, one of which is the emergence of artifacts in attention maps, hindering their interpretability. Darcet et al. (2024) investigated this phenomenon and attributed it to the need of ViTs to store global information beyond the [CLS] token. They proposed a novel solution involving the addition of empty input tokens, named registers, which successfully eliminate artifacts and improve the clarity of attention maps. In this work, we reproduce the findings of Darcet et al. (2024) and evaluate the generalizability of their claims across multiple models, including DINO, DINOv2, OpenCLIP, and DeiT3. While we confirm the validity of several of their key claims, our results reveal that some claims do not extend universally to other models. Additionally, we explore the impact of model size, extending their findings to smaller models. Finally, we untie terminology inconsistencies found in the original paper and explain their impact when generalizing to a wider range of models.

AIJul 3, 2018

Solving Atari Games Using Fractals And Entropy

Sergio Hernandez Cerezo, Guillem Duran Ballester, Spiros Baxevanakis

In this paper, we introduce a novel MCTS based approach that is derived from the laws of the thermodynamics. The algorithm coined Fractal Monte Carlo (FMC), allows us to create an agent that takes intelligent actions in both continuous and discrete environments while providing control over every aspect of the agent behavior. Results show that FMC is several orders of magnitude more efficient than similar techniques, such as MCTS, in the Atari games tested.

Spiros Baxevanakis

2 Papers