SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
This addresses the issue of reduced LLM utility for advanced users who could benefit from blocked knowledge, offering a more flexible alignment mechanism.
The paper tackles the problem of one-size-fits-all preference alignment in large language models (LLMs), which uniformly blocks non-preferred knowledge for all users, by proposing SudoLM, a framework that learns access control over parametric knowledge for users with different credentials, effectively controlling access and maintaining utility in experiments on two application scenarios.
Existing preference alignment is a one-size-fits-all alignment mechanism, where the part of the large language model (LLM) parametric knowledge with non-preferred features is uniformly blocked to all the users. However, this part of knowledge can be useful to advanced users whose expertise qualifies them to handle these information. The one-size-fits-all alignment mechanism undermines LLM's utility for these qualified users. To address this problem, we propose SudoLM, a framework that lets LLMs learn access control over specific parametric knowledge for users with different credentials via authorization alignment. SudoLM allows authorized users to unlock their access to all the parametric knowledge with an assigned SUDO key while blocking access to non-qualified users. Experiments on two application scenarios demonstrate that SudoLM effectively controls the user's access to the parametric knowledge and maintains its general utility.