Privacy Meets Explainability: Managing Confidential Data and Transparency Policies in LLM-Empowered Science
It tackles privacy and transparency issues for scientists using LLMs in research, but appears incremental as it builds on existing privacy management concepts.
The paper addresses the risk of confidential data leaks in LLM-powered scientific tools and proposes DataShield, a framework to detect leaks, summarize privacy policies, and visualize data flow, with ongoing user studies to evaluate its effectiveness.
As Large Language Models (LLMs) become integral to scientific workflows, concerns over the confidentiality and ethical handling of confidential data have emerged. This paper explores data exposure risks through LLM-powered scientific tools, which can inadvertently leak confidential information, including intellectual property and proprietary data, from scientists' perspectives. We propose "DataShield", a framework designed to detect confidential data leaks, summarize privacy policies, and visualize data flow, ensuring alignment with organizational policies and procedures. Our approach aims to inform scientists about data handling practices, enabling them to make informed decisions and protect sensitive information. Ongoing user studies with scientists are underway to evaluate the framework's usability, trustworthiness, and effectiveness in tackling real-world privacy challenges.