Frank Pallas

h-index3

6papers

56citations

Novelty45%

AI Score25

Ranked #172,452 of 201,326 authors (top 86%)#799 in CY (top 68%)

6 Papers

CYMay 2, 2024

Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification

Dimitri Staufer, Frank Pallas, Bettina Berendt

Whistleblowing is essential for ensuring transparency and accountability in both public and private sectors. However, (potential) whistleblowers often fear or face retaliation, even when reporting anonymously. The specific content of their disclosures and their distinct writing style may re-identify them as the source. Legal measures, such as the EU WBD, are limited in their scope and effectiveness. Therefore, computational methods to prevent re-identification are important complementary tools for encouraging whistleblowers to come forward. However, current text sanitization tools follow a one-size-fits-all approach and take an overly limited view of anonymity. They aim to mitigate identification risk by replacing typical high-risk words (such as person names and other NE labels) and combinations thereof with placeholders. Such an approach, however, is inadequate for the whistleblowing scenario since it neglects further re-identification potential in textual features, including writing style. Therefore, we propose, implement, and evaluate a novel classification and mitigation strategy for rewriting texts that involves the whistleblower in the assessment of the risk and utility. Our prototypical tool semi-automatically evaluates risk at the word/term level and applies risk-adapted anonymization techniques to produce a grammatically disjointed yet appropriately sanitized text. We then use a LLM that we fine-tuned for paraphrasing to render this text coherent and style-neutral. We evaluate our tool's effectiveness using court cases from the ECHR and excerpts from a real-world whistleblower testimony and measure the protection against authorship attribution (AA) attacks and utility loss statistically using the popular IMDb62 movie reviews dataset. Our method can significantly reduce AA accuracy from 98.81% to 31.22%, while preserving up to 73.1% of the original content's semantics.

CYMay 24, 2023

A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity

Michael Gebauer, Faraz Maschhur, Nicola Leschke et al.

Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness. In this work, we present a prototype system for a `Human-in-the-Loop' approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.

CROct 29, 2021

RedCASTLE: Practically Applicable $k_s$-Anonymity for IoT Streaming Data at the Edge in Node-RED

Frank Pallas, Julian Legler, Niklas Amslgruber et al.

In this paper, we present RedCASTLE, a practically applicable solution for Edge-based $k_s$-anonymization of IoT streaming data in Node-RED. RedCASTLE builds upon a pre-existing, rudimentary implementation of the CASTLE algorithm and significantly extends it with functionalities indispensable for real-world IoT scenarios. In addition, RedCASTLE provides an abstraction layer for smoothly integrating $k_s$-anonymization into Node-RED, a visually programmable middleware for streaming dataflows widely used in Edge-based IoT scenarios. Last but not least, RedCASTLE also provides further capabilities for basic information reduction that complement $k_s$-anonymization in the privacy-friendly implementation of usecases involving IoT streaming data. A preliminary performance assessment finds that RedCASTLE comes with reasonable overheads and demonstrates its practical viability.

CROct 28, 2021

Messaging with Purpose Limitation -- Privacy-Compliant Publish-Subscribe Systems

Karl Wolf, Frank Pallas, Stefan Tai

Purpose limitation is an important privacy principle to ensure that personal data may only be used for the declared purposes it was originally collected for. Ensuring compliance with respective privacy regulations like the GDPR, which codify purpose limitation as an obligation, consequently, is a major challenge in real-world enterprise systems. Technical solutions under the umbrella of purpose-based access control (PBAC), however, focus mostly on data being held at-rest in databases, while PBAC for communication and publish-subscribe messaging in particular has received only little attention. In this paper, we argue for PBAC to be also applied to data-in-transit and introduce and study a concrete proof-of-concept implementation, which extends a popular MQTT message broker with purpose limitation. On this basis, purpose limitation as a core privacy principle can be addressed in enterprise IoT and message-driven integration architectures that do not focus on databases but event-driven communication and integration instead.

SEJun 10, 2021

TIRA: An OpenAPI Extension and Toolbox for GDPR Transparency in RESTful Architectures

Elias Grünewald, Paul Wille, Frank Pallas et al.

Transparency - the provision of information about what personal data is collected for which purposes, how long it is stored, or to which parties it is transferred - is one of the core privacy principles underlying regulations such as the GDPR. Technical approaches for implementing transparency in practice are, however, only rarely considered. In this paper, we present a novel approach for doing so in current, RESTful application architectures and in line with prevailing agile and DevOps-driven practices. For this purpose, we introduce 1) a transparency-focused extension of OpenAPI specifications that allows individual service descriptions to be enriched with transparency-related annotations in a bottom-up fashion and 2) a set of higher-order tools for aggregating respective information across multiple, interdependent services and for coherently integrating our approach into automated CI/CD-pipelines. Together, these building blocks pave the way for providing transparency information that is more specific and at the same time better reflects the actual implementation givens within complex service architectures than current, overly broad privacy statements.

CYDec 18, 2020

TILT: A GDPR-Aligned Transparency Information Language and Toolkit for Practical Privacy Engineering

Elias Grünewald, Frank Pallas

In this paper, we present TILT, a transparency information language and toolkit explicitly designed to represent and process transparency information in line with the requirements of the GDPR and allowing for a more automated and adaptive use of such information than established, legalese data protection policies do. We provide a detailed analysis of transparency obligations from the GDPR to identify the expressiveness required for a formal transparency language intended to meet respective legal requirements. In addition, we identify a set of further, non-functional requirements that need to be met to foster practical adoption in real-world (web) information systems engineering. On this basis, we specify our formal language and present a respective, fully implemented toolkit around it. We then evaluate the practical applicability of our language and toolkit and demonstrate the additional prospects it unlocks through two different use cases: a) the inter-organizational analysis of personal data-related practices allowing, for instance, to uncover data sharing networks based on explicitly announced transparency information and b) the presentation of formally represented transparency information to users through novel, more comprehensible, and potentially adaptive user interfaces, heightening data subjects' actual informedness about data-related practices and, thus, their sovereignty. Altogether, our transparency information language and toolkit allow - differently from previous work - to express transparency information in line with actual legal requirements and practices of modern (web) information systems engineering and thereby pave the way for a multitude of novel possibilities to heighten transparency and user sovereignty in practice.