ULTra: Unveiling Latent Token Interpretability in Transformer-Based Understanding and Segmentation
This work addresses the interpretability challenge in Transformer-based models for computer vision and natural language processing, offering a tool for explaining semantic structures without fine-tuning, though it is incremental in building on existing pre-trained models.
The paper tackles the problem of interpreting latent token representations in Transformers, which are complex and difficult to understand, by introducing ULTra, a framework that enables unsupervised semantic segmentation and achieves state-of-the-art performance in this task.
Transformers have revolutionized Computer Vision (CV) through self-attention mechanisms. However, their complexity makes latent token representations difficult to interpret. We introduce ULTra, a framework for interpreting Transformer embeddings and uncovering meaningful semantic patterns within them. ULTra enables unsupervised semantic segmentation using pre-trained models without requiring fine-tuning. Additionally, we propose a self-supervised training approach that refines segmentation performance by learning an external transformation matrix without modifying the underlying model. Our method achieves state-of-the-art performance in unsupervised semantic segmentation, outperforming existing segmentation methods. Furthermore, we validate ULTra for model interpretation on both synthetic and real-world scenarios, including Object Selection and interpretable text summarization using LLMs, demonstrating its broad applicability in explaining the semantic structure of latent token representations.