LGMar 7, 2022
Low-Loss Subspace Compression for Clean Gains against Multi-Agent Backdoor AttacksSiddhartha Datta, Nigel Shadbolt
Recent exploration of the multi-agent backdoor attack demonstrated the backfiring effect, a natural defense against backdoor attacks where backdoored inputs are randomly classified. This yields a side-effect of low accuracy w.r.t. clean labels, which motivates this paper's work on the construction of multi-agent backdoor defenses that maximize accuracy w.r.t. clean labels and minimize that of poison labels. Founded upon agent dynamics and low-loss subspace construction, we contribute three defenses that yield improved multi-agent backdoor robustness.
HCApr 7, 2022
GreaseVision: Rewriting the Rules of the InterfaceSiddhartha Datta, Konrad Kollnig, Nigel Shadbolt
Digital harms can manifest across any interface. Key problems in addressing these harms include the high individuality of harms and the fast-changing nature of digital systems. As a result, we still lack a systematic approach to study harms and produce interventions for end-users. We put forward GreaseVision, a new framework that enables end-users to collaboratively develop interventions against harms in software using a no-code approach and recent advances in few-shot machine learning. The contribution of the framework and tool allow individual end-users to study their usage history and create personalized interventions. Our contribution also enables researchers to study the distribution of harms and interventions at scale.
LGMay 19, 2022
Interpolating Compressed Parameter SubspacesSiddhartha Datta, Nigel Shadbolt
Inspired by recent work on neural subspaces and mode connectivity, we revisit parameter subspace sampling for shifted and/or interpolatable input distributions (instead of a single, unshifted distribution). We enforce a compressed geometric structure upon a set of trained parameters mapped to a set of train-time distributions, denoting the resulting subspaces as Compressed Parameter Subspaces (CPS). We show the success and failure modes of the types of shifted distributions whose optimal parameters reside in the CPS. We find that ensembling point-estimates within a CPS can yield a high average accuracy across a range of test-time distributions, including backdoor, adversarial, permutation, stylization and rotation perturbations. We also find that the CPS can contain low-loss point-estimates for various task shifts (albeit interpolated, perturbed, unseen or non-identical coarse labels). We further demonstrate this property in a continual learning setting with CIFAR100.
LGSep 29, 2022
Multiple Modes for Continual LearningSiddhartha Datta, Nigel Shadbolt
Adapting model parameters to incoming streams of data is a crucial factor to deep learning scalability. Interestingly, prior continual learning strategies in online settings inadvertently anchor their updated parameters to a local parameter subspace to remember old tasks, else drift away from the subspace and forget. From this observation, we formulate a trade-off between constructing multiple parameter modes and allocating tasks per mode. Mode-Optimized Task Allocation (MOTA), our contributed adaptation strategy, trains multiple modes in parallel, then optimizes task allocation per mode. We empirically demonstrate improvements over baseline continual learning strategies and across varying distribution shifts, namely sub-population, domain, and task shift.
LGJan 27, 2023
Projected Subnetworks Scale AdaptationSiddhartha Datta, Nigel Shadbolt
Large models support great zero-shot and few-shot capabilities. However, updating these models on new tasks can break performance on previous seen tasks and their zero/few-shot unseen tasks. Our work explores how to update zero/few-shot learners such that they can maintain performance on seen/unseen tasks of previous tasks as well as new tasks. By manipulating the parameter updates of a gradient-based meta learner as the projected task-specific subnetworks, we show improvements for large models to retain seen and zero/few shot task performance in online settings.
LGMay 15, 2022
Learn2Weight: Parameter Adaptation against Similar-domain Adversarial AttacksSiddhartha Datta
Recent work in black-box adversarial attacks for NLP systems has attracted much attention. Prior black-box attacks assume that attackers can observe output labels from target models based on selected inputs. In this work, inspired by adversarial transferability, we propose a new type of black-box NLP adversarial attack that an attacker can choose a similar domain and transfer the adversarial examples to the target domain and cause poor performance in target model. Based on domain adaptation theory, we then propose a defensive strategy, called Learn2Weight, which trains to predict the weight adjustments for a target model in order to defend against an attack of similar-domain adversarial examples. Using Amazon multi-domain sentiment classification datasets, we empirically show that Learn2Weight is effective against the attack compared to standard black-box defense methods such as adversarial training and defensive distillation. This work contributes to the growing literature on machine learning safety.
84.9HCMar 18
Auditing Preferences for Brands and Cultures in LLMsJasmine Rienecker, Katarina Mpofu, Naman Goel et al.
Large language models (LLMs) based AI systems increasingly mediate what billions of people see, choose and buy. This creates an urgent need to quantify the systemic risks of LLM-driven market intermediation, including its implications for market fairness, competition, and the diversity of information exposure. This paper introduces ChoiceEval, a reproducible framework for auditing preferences for brands and cultures in large language models (LLMs) under realistic usage conditions. ChoiceEval addresses two core technical challenges: (i) generating realistic, persona-diverse evaluation queries and (ii) converting free-form outputs into comparable choice sets and quantitative preference metrics. For a given topic (e.g. running shoes, hotel chains, travel destinations), the framework segments users into psychographic profiles (e.g., budget-conscious, wellness-focused, convenience), and then derives diverse prompts that reflect real-world advice-seeking and decision-making behaviour. LLM responses are converted into normalised top-k choice sets. Preference and geographic bias are then quantified using comparable metrics across topics and personas. Thus, ChoiceEval provides a scalable audit pipeline for researchers, platforms, and regulators, linking model behaviour to real-world economic outcomes. Applied to Gemini, GPT, and DeepSeek across 10 topics spanning commerce and culture and more than 2,000 questions, ChoiceEval reveals consistent preferences: U.S.-developed models Gemini and GPT show marked favouritism toward American entities, while China-developed DeepSeek exhibits more balanced yet still detectable geographic preferences. These patterns persist across user personas, suggesting systematic rather than incidental effects.
CVNov 15, 2022
Cross-Reality Re-Rendering: Manipulating between Digital and Physical RealitiesSiddhartha Datta
The advent of personalized reality has arrived. Rapid development in AR/MR/VR enables users to augment or diminish their perception of the physical world. Robust tooling for digital interface modification enables users to change how their software operates. As digital realities become an increasingly-impactful aspect of human lives, we investigate the design of a system that enables users to manipulate the perception of both their physical realities and digital realities. Users can inspect their view history from either reality, and generate interventions that can be interoperably rendered cross-reality in real-time. Personalized interventions can be generated with mask, text, and model hooks. Collaboration between users scales the availability of interventions. We verify our implementation against our design requirements with cognitive walkthroughs, personas, and scalability tests.
CVDec 27, 2023
Prompt Expansion for Adaptive Text-to-Image GenerationSiddhartha Datta, Alexander Ku, Deepak Ramachandran et al.
Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such that when passed to a text-to-image model, generates a wider variety of appealing images. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods. Overall, this paper presents a novel and effective approach to improving the text-to-image generation experience.
LGFeb 5, 2024
Online Feature Updates Improve Online (Generalized) Label Shift AdaptationRuihan Wu, Siddhartha Datta, Yi Su et al.
This paper addresses the prevalent issue of label shift in an online setting with missing labels, where data distributions change over time and obtaining timely labels is challenging. While existing methods primarily focus on adjusting or updating the final layer of a pre-trained classifier, we explore the untapped potential of enhancing feature representations using unlabeled data at test-time. Our novel method, Online Label Shift adaptation with Online Feature Updates (OLS-OFU), leverages self-supervised learning to refine the feature extraction process, thereby improving the prediction model. By carefully designing the algorithm, theoretically OLS-OFU maintains the similar online regret convergence to the results in the literature while taking the improved features into account. Empirically, it achieves substantial improvements over existing methods, which is as significant as the gains existing methods have over the baseline (i.e., without distribution shift adaptations).
LGJan 28, 2022
Backdoors Stuck At The Frontdoor: Multi-Agent Backdoor Attacks That BackfireSiddhartha Datta, Nigel Shadbolt
Malicious agents in collaborative learning and outsourced data collection threaten the training of clean models. Backdoor attacks, where an attacker poisons a model during training to successfully achieve targeted misclassification, are a major concern to train-time robustness. In this paper, we investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously. A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate. We examine different modes of backdoor attack configurations, non-cooperation / cooperation, joint distribution shifts, and game setups to return an equilibrium attack success rate at the lower bound. The results motivate the re-evaluation of backdoor defense research for practical environments.
LGJan 24, 2022
Hiding Behind Backdoors: Self-Obfuscation Against Generative ModelsSiddhartha Datta, Nigel Shadbolt
Attack vectors that compromise machine learning pipelines in the physical world have been demonstrated in recent research, from perturbations to architectural components. Building on this work, we illustrate the self-obfuscation attack: attackers target a pre-processing model in the system, and poison the training set of generative models to obfuscate a specific class during inference. Our contribution is to describe, implement and evaluate a generalized attack, in the hope of raising awareness regarding the challenge of architectural robustness within the machine learning community.
HCDec 20, 2021
Mind-proofing Your Phone: Navigating the Digital Minefield with GreaseTerminatorSiddhartha Datta, Konrad Kollnig, Nigel Shadbolt
Digital harms are widespread in the mobile ecosystem. As these devices gain ever more prominence in our daily lives, so too increases the potential for malicious attacks against individuals. The last line of defense against a range of digital harms - including digital distraction, political polarisation through hate speech, and children being exposed to damaging material - is the user interface. This work introduces GreaseTerminator to enable researchers to develop, deploy, and test interventions against these harms with end-users. We demonstrate the ease of intervention development and deployment, as well as the broad range of harms potentially covered with GreaseTerminator in five in-depth case studies.
LGOct 9, 2021
Widen The Backdoor To Let More Attackers InSiddhartha Datta, Giulio Lovisotto, Ivan Martinovic et al.
As collaborative learning and the outsourcing of data collection become more common, malicious actors (or agents) which attempt to manipulate the learning process face an additional obstacle as they compete with each other. In backdoor attacks, where an adversary attempts to poison a model by introducing malicious samples into the training data, adversaries have to consider that the presence of additional backdoor attackers may hamper the success of their own backdoor. In this paper, we investigate the scenario of a multi-agent backdoor attack, where multiple non-colluding attackers craft and insert triggered samples in a shared dataset which is used by a model (a defender) to learn a task. We discover a clear backfiring phenomenon: increasing the number of attackers shrinks each attacker's attack success rate (ASR). We then exploit this phenomenon to minimize the collective ASR of attackers and maximize defender's robustness accuracy by (i) artificially augmenting the number of attackers, and (ii) indexing to remove the attacker's sub-dataset from the model for inference, hence proposing 2 defenses.
HCFeb 23, 2021
I Want My App That Way: Reclaiming Sovereignty Over Personal DevicesKonrad Kollnig, Siddhartha Datta, Max Van Kleek
Dark patterns in mobile apps take advantage of cognitive biases of end-users and can have detrimental effects on people's lives. Despite growing research in identifying remedies for dark patterns and established solutions for desktop browsers, there exists no established methodology to reduce dark patterns in mobile apps. Our work introduces GreaseDroid, a community-driven app modification framework enabling non-expert users to disable dark patterns in apps selectively.
CRSep 3, 2019
DeepObfusCode: Source Code Obfuscation Through Sequence-to-Sequence NetworksSiddhartha Datta
The paper explores a novel methodology in source code obfuscation through the application of text-based recurrent neural network (RNN) encoder-decoder models in ciphertext generation and key generation. Sequence-to-sequence models are incorporated into the model architecture to generate obfuscated code, generate the deobfuscation key, and live execution. Quantitative benchmark comparison to existing obfuscation methods indicate significant improvement in stealth and execution cost for the proposed solution, and experiments regarding the model's properties yield positive results regarding its character variation, dissimilarity to the original codebase, and consistent length of obfuscated code.