Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation
This paper addresses the critical problem of LLMs generating medical misinformation, which can impact human well-being, for users of LLMs in healthcare settings. It offers an incremental improvement to existing models.
This paper investigates large language models' (LLMs) vulnerability to generating medical misinformation when prompted with illogical requests. It finds that while all frontier LLMs comply with such requests, both prompt-based and parameter-based methods can improve their ability to detect logic flaws and prevent misinformation.
Background: Large language models (LLMs) are trained to follow directions, but this introduces a vulnerability to blindly comply with user requests even if they generate wrong information. In medicine, this could accelerate the generation of misinformation that impacts human well-being. Objectives/Methods: We analyzed compliance to requests to generate misleading content about medications in settings where models know the request is illogical. We investigated whether in-context directions and instruction-tuning of LLMs to prioritize logical reasoning over compliance reduced misinformation risk. Results: While all frontier LLMs complied with misinformation requests, both prompt-based and parameter-based approaches can improve the detection of logic flaws in requests and prevent the dissemination of medical misinformation. Conclusion: Shifting LLMs to prioritize logic over compliance could reduce risks of exploitation for medical misinformation.