CVMar 25, 2025
LangBridge: Interpreting Image as a Combination of Language EmbeddingsJiaqi Liao, Yuwei Niu, Fanqing Meng et al.
Recent years have witnessed remarkable advances in Large Vision-Language Models (LVLMs), which have achieved human-level performance across various complex vision-language tasks. Following LLaVA's paradigm, mainstream LVLMs typically employ a shallow MLP for visual-language alignment through a two-stage training process: pretraining for cross-modal alignment followed by instruction tuning. While this approach has proven effective, the underlying mechanisms of how MLPs bridge the modality gap remain poorly understood. Although some research has explored how LLMs process transformed visual tokens, few studies have investigated the fundamental alignment mechanism. Furthermore, the MLP adapter requires retraining whenever switching LLM backbones. To address these limitations, we first investigate the working principles of MLP adapters and discover that they learn to project visual embeddings into subspaces spanned by corresponding text embeddings progressively. Based on this insight, we propose LangBridge, a novel adapter that explicitly maps visual tokens to linear combinations of LLM vocabulary embeddings. This innovative design enables pretraining-free adapter transfer across different LLMs while maintaining performance. Our experimental results demonstrate that a LangBridge adapter pre-trained on Qwen2-0.5B can be directly applied to larger models such as LLaMA3-8B or Qwen2.5-14B while maintaining competitive performance. Overall, LangBridge enables interpretable vision-language alignment by grounding visual representations in LLM vocab embedding, while its plug-and-play design ensures efficient reuse across multiple LLMs with nearly no performance degradation. See our project page at https://curryx-001.github.io/LangBridge.github.io/
CRAug 25, 2021
Decoys in Cybersecurity: An Exploratory Study to Test the Effectiveness of 2-sided DeceptionPalvi Aggarwal, Yinuo Du, Kuldeep Singh et al.
One of the widely used cyber deception techniques is decoying, where defenders create fictitious machines (i.e., honeypots) to lure attackers. Honeypots are deployed to entice attackers, but their effectiveness depends on their configuration as that would influence whether attackers will judge them as "real" machines or not. In this work, we study two-sided deception, where we manipulate the observed configuration of both honeypots and real machines. The idea is to improve cyberdefense by either making honeypots ``look like'' real machines or by making real machines ``look like honeypots.'"We identify the modifiable features of both real machines and honeypots and conceal these features to different degrees. In an experiment, we study three conditions: default features on both honeypot and real machines, concealed honeypots only, and concealed both honeypots and real machines. We use a network with 40 machines where 20 of them are honeypots. We manipulate the features of the machines, and using an experimental testbed (HackIT), we test the effectiveness of the decoying strategies against humans attackers. Results indicate that: Any of the two forms of deception (conceal honeypots and conceal both honeypots and real machines) is better than no deception at all. We observe that attackers attempted more exploits on honeypots and exfiltrated more data from honeypots in the two forms of deception conditions. However, the attacks on honeypots and data exfiltration were not different within the deception conditions. Results inform cybersecurity defenders on how to manipulate the observable features of honeypots and real machines to create uncertainty for attackers and improve cyberdefense.