Haiyue Yuan

h-index6

4papers

Novelty33%

AI Score33

Ranked #116,335 of 194,257 authors (top 60%)#7,123 in AI (top 57%)

4 Papers

2.4AIJan 15

LADFA: A Framework of Using Large Language Models and Retrieval-Augmented Generation for Personal Data Flow Analysis in Privacy Policies

Haiyue Yuan, Nikolay Matyunin, Ali Raza et al.

Privacy policies help inform people about organisations' personal data processing practices, covering different aspects such as data collection, data storage, and sharing of personal data with third parties. Privacy policies are often difficult for people to fully comprehend due to the lengthy and complex legal language used and inconsistent practices across different sectors and organisations. To help conduct automated and large-scale analyses of privacy policies, many researchers have studied applications of machine learning and natural language processing techniques, including large language models (LLMs). While a limited number of prior studies utilised LLMs for extracting personal data flows from privacy policies, our approach builds on this line of work by combining LLMs with retrieval-augmented generation (RAG) and a customised knowledge base derived from existing studies. This paper presents the development of LADFA, an end-to-end computational framework, which can process unstructured text in a given privacy policy, extract personal data flows and construct a personal data flow graph, and conduct analysis of the data flow graph to facilitate insight discovery. The framework consists of a pre-processor, an LLM-based processor, and a data flow post-processor. We demonstrated and validated the effectiveness and accuracy of the proposed approach by conducting a case study that involved examining ten selected privacy policies from the automotive industry. Moreover, it is worth noting that LADFA is designed to be flexible and customisable, making it suitable for a range of text-based analysis tasks beyond privacy policy analysis.

1.2SIAug 24, 2022

Graphical Models of False Information and Fact Checking Ecosystems

Haiyue Yuan, Enes Altuncu, Shujun Li et al.

The wide spread of false information online including misinformation and disinformation has become a major problem for our highly digitised and globalised society. A lot of research has been done to better understand different aspects of false information online such as behaviours of different actors and patterns of spreading, and also on better detection and prevention of such information using technical and socio-technical means. One major approach to detect and debunk false information online is to use human fact-checkers, who can be helped by automated tools. Despite a lot of research done, we noticed a significant gap on the lack of conceptual models describing the complicated ecosystems of false information and fact checking. In this paper, we report the first graphical models of such ecosystems, focusing on false information online in multiple contexts, including traditional media outlets and user-generated content. The proposed models cover a wide range of entity types and relationships, and can be a new useful tool for researchers and practitioners to study false information online and the effects of fact checking.

8.8CYApr 7

Does Travel Stage Matter? How Leisure Travellers Perceive Their Privacy Attitudes Towards Personal Data Sharing Before, During, and After Travel

Haiyue Yuan, Shujun Li, Fatima Gillani et al.

People's attitudes towards personal data sharing have been extensively researched, however, limited research studied their evolving nature in across different stages of a leisure trip. This paper addresses this gap by exploring how leisure travellers' attitudes towards sharing personal data change before, during and after travel. Analysing data from an online survey with 318 participants, we found that participants' privacy attitudes towards sharing different personal data vary based on sharing purposes and travel stages. Interestingly, participants exhibited a more relaxed attitude towards sharing commonly sensitive personal data (e.g., name, gender) compared to other types of personal data. This is likely because sharing such data for travel bookings has become essential and widely accepted among travellers when using booking sites, which is in line with previous work stating that information easily obtainable is typically not seen as highly confidential. Moreover, despite participants' self-reported frequent use of social media platforms, content sharing is minimal on TikTok, YouTube, Snapchat, Pinterest, and Twitter. Conversely, Facebook and Instagram were more common for travel-related content sharing. This pattern remains consistent across the three stages of travel, suggesting that the stage of travel does not significantly influence how people share on social media platforms, which has been overlooked in past studies. Furthermore, we discovered that a participant's gender, previous travel frequency, and country of residence can influence their perceptions of personal data sharing at different travel stages, confirming the complex and context-dependent nature of privacy perception and attitudes. Based on the findings observed from this study, we further discuss implications and potential contributions of our work to the privacy and security community in general.

2.9CRJul 22, 2020

Exploiting Behavioral Side-Channels in Observation Resilient Cognitive Authentication Schemes

Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Mohamed Ali Kaafar et al.

Observation Resilient Authentication Schemes (ORAS) are a class of shared secret challenge-response identification schemes where a user mentally computes the response via a cognitive function to authenticate herself such that eavesdroppers cannot readily extract the secret. Security evaluation of ORAS generally involves quantifying information leaked via observed challenge-response pairs. However, little work has evaluated information leaked via human behavior while interacting with these schemes. A common way to achieve observation resilience is by including a modulus operation in the cognitive function. This minimizes the information leaked about the secret due to the many-to-one map from the set of possible secrets to a given response. In this work, we show that user behavior can be used as a side-channel to obtain the secret in such ORAS. Specifically, the user's eye-movement patterns and associated timing information can deduce whether a modulus operation was performed (a fundamental design element), to leak information about the secret. We further show that the secret can still be retrieved if the deduction is erroneous, a more likely case in practice. We treat the vulnerability analytically, and propose a generic attack algorithm that iteratively obtains the secret despite the "faulty" modulus information. We demonstrate the attack on five ORAS, and show that the secret can be retrieved with considerably less challenge-response pairs than non-side-channel attacks (e.g., algebraic/statistical attacks). In particular, our attack is applicable on Mod10, a one-time-pad based scheme, for which no non-side-channel attack exists. We field test our attack with a small-scale eye-tracking user study.