SD AI CL ASOct 19, 2025

SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models

Chih-Kai Yang, Yen-Ting Piao, Tzu-Wen Hsu, Szu-Wei Fu, Zhehuai Chen, Ke-Han Lu, Sung-Feng Huang, Chao-Han Huck Yang, Yu-Chiang Frank Wang, Yun-Nung Chen, Hung-yi Lee

arXiv:2510.16917v112.94 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work addresses the need for efficient knowledge updates in LALMs for real-world applications, but it is incremental as it extends existing knowledge editing techniques from textual/visual to auditory modalities.

The paper tackles the problem of updating knowledge in Large Audio-Language Models (LALMs) without full retraining by introducing SAKE, the first benchmark for editing auditory attribute knowledge, and benchmarks seven editing methods on two LALMs across four dimensions, revealing challenges like preserving unrelated intra-attribute knowledge and generalizing edits to multimodal reasoning.

Knowledge editing offers an efficient way to update model knowledge without full retraining, but prior work has concentrated almost exclusively on textual or visual modalities. We introduce SAKE, the first benchmark specifically designed for editing auditory attribute knowledge in Large Audio-Language Models (LALMs). Unlike factual updates, SAKE targets several abstract auditory attributes, capturing knowledge types that go beyond conventional textual and visual domains. We benchmark seven editing methods on two LALMs along four dimensions: reliability, generality, audio/text locality, and portability. Results highlight challenges such as preserving intra-attribute knowledge unrelated to the edit, generalizing edits to multimodal reasoning, and maintaining edits under sequential updates. SAKE provides a principled framework to study how knowledge editing extends to the auditory modalities, opening new directions for maintaining and adapting LALMs in more diverse real-world scenarios.

View on arXiv PDF

Similar