CVAug 19, 2025

CLIPSym: Delving into Symmetry Detection with CLIP

arXiv:2508.14197v1h-index: 3Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of detecting geometric symmetries in images, which is important for computer vision applications, but it is incremental as it builds on existing CLIP models and symmetry detection methods.

The paper tackles symmetry detection in computer vision by proposing CLIPSym, a method that leverages a pre-trained CLIP model with a novel prompting technique and an equivariant decoder, achieving state-of-the-art results on three standard datasets (DENDI, SDRW, and LDRS).

Symmetry is one of the most fundamental geometric cues in computer vision, and detecting it has been an ongoing challenge. With the recent advances in vision-language models,~i.e., CLIP, we investigate whether a pre-trained CLIP model can aid symmetry detection by leveraging the additional symmetry cues found in the natural image descriptions. We propose CLIPSym, which leverages CLIP's image and language encoders and a rotation-equivariant decoder based on a hybrid of Transformer and $G$-Convolution to detect rotation and reflection symmetries. To fully utilize CLIP's language encoder, we have developed a novel prompting technique called Semantic-Aware Prompt Grouping (SAPG), which aggregates a diverse set of frequent object-based prompts to better integrate the semantic cues for symmetry detection. Empirically, we show that CLIPSym outperforms the current state-of-the-art on three standard symmetry detection datasets (DENDI, SDRW, and LDRS). Finally, we conduct detailed ablations verifying the benefits of CLIP's pre-training, the proposed equivariant decoder, and the SAPG technique. The code is available at https://github.com/timyoung2333/CLIPSym.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes