CLJul 6, 2025

Adapter-state Sharing CLIP for Parameter-efficient Multimodal Sarcasm Detection

Soumyadeep Jana, Sahil Danayak, Sanasam Ranbir Singh

arXiv:2507.04508v22.7h-index: 2

Originality Incremental advance

AI Analysis

This addresses sarcasm detection for social media opinion mining in resource-constrained settings, offering an incremental improvement over existing parameter-efficient methods.

The paper tackled multimodal sarcasm detection by proposing AdS-CLIP, a lightweight framework that uses adapter-state sharing in CLIP, achieving better performance than standard PEFT methods and existing multimodal baselines with significantly fewer trainable parameters.

The growing prevalence of multimodal image-text sarcasm on social media poses challenges for opinion mining systems. Existing approaches rely on full fine-tuning of large models, making them unsuitable to adapt under resource-constrained settings. While recent parameter-efficient fine-tuning (PEFT) methods offer promise, their off-the-shelf use underperforms on complex tasks like sarcasm detection. We propose AdS-CLIP (Adapter-state Sharing in CLIP), a lightweight framework built on CLIP that inserts adapters only in the upper layers to preserve low-level unimodal representations in the lower layers and introduces a novel adapter-state sharing mechanism, where textual adapters guide visual ones to promote efficient cross-modal learning in the upper layers. Experiments on two public benchmarks demonstrate that AdS-CLIP not only outperforms standard PEFT methods but also existing multimodal baselines with significantly fewer trainable parameters.

View on arXiv PDF

Similar