Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence
This work addresses efficiency and robustness challenges in point cloud matching for computer vision applications, but it is incremental as it builds on existing observations of attention patterns.
The paper tackles the problem of high training time and computational demands in point cloud matching by integrating fixed Gaussian patterns as attention weights in a Transformer architecture, resulting in accelerated training and improved optimization stability.
Current data-driven methodologies for point cloud matching demand extensive training time and computational resources, presenting significant challenges for model deployment and application. In the point cloud matching task, recent advancements with an encoder-only Transformer architecture have revealed the emergence of semantically meaningful patterns in the attention heads, particularly resembling Gaussian functions centered on each point of the input shape. In this work, we further investigate this phenomenon by integrating these patterns as fixed attention weights within the attention heads of the Transformer architecture. We evaluate two variants: one utilizing predetermined variance values for the Gaussians, and another where the variance values are treated as learnable parameters. Additionally we analyze the performances on noisy data and explore a possible way to improve robustness to noise. Our findings demonstrate that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization. Furthermore, we conducted an ablation study to identify the specific layers where the infused information is most impactful and to understand the reliance of the network on this information.