CLJul 6, 2025

Adapter-state Sharing CLIP for Parameter-efficient Multimodal Sarcasm Detection

arXiv:2507.04508v2h-index: 2
Originality Incremental advance
AI Analysis

This addresses sarcasm detection for social media opinion mining in resource-constrained settings, offering an incremental improvement over existing parameter-efficient methods.

The paper tackled multimodal sarcasm detection by proposing AdS-CLIP, a lightweight framework that uses adapter-state sharing in CLIP, achieving better performance than standard PEFT methods and existing multimodal baselines with significantly fewer trainable parameters.

The growing prevalence of multimodal image-text sarcasm on social media poses challenges for opinion mining systems. Existing approaches rely on full fine-tuning of large models, making them unsuitable to adapt under resource-constrained settings. While recent parameter-efficient fine-tuning (PEFT) methods offer promise, their off-the-shelf use underperforms on complex tasks like sarcasm detection. We propose AdS-CLIP (Adapter-state Sharing in CLIP), a lightweight framework built on CLIP that inserts adapters only in the upper layers to preserve low-level unimodal representations in the lower layers and introduces a novel adapter-state sharing mechanism, where textual adapters guide visual ones to promote efficient cross-modal learning in the upper layers. Experiments on two public benchmarks demonstrate that AdS-CLIP not only outperforms standard PEFT methods but also existing multimodal baselines with significantly fewer trainable parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes