CLOct 25, 2023
Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-VotingPreethi Lahoti, Nicholas Blumm, Xiao Ma et al.
A crucial challenge for generative large language models (LLMs) is diversity: when a user's prompt is under-specified, models may follow implicit assumptions while generating a response, which may result in homogenization of the responses, as well as certain demographic groups being under-represented or even erased from the generated responses. In this paper, we formalize diversity of representation in generative LLMs. We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes. We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal. This finding motivated a new prompting technique called collective-critique and self-voting (CCSV) to self-improve people diversity of LLMs by tapping into its diversity reasoning capabilities, without relying on handcrafted examples or prompt tuning. Extensive empirical experiments with both human and automated evaluations show that our proposed approach is effective at improving people and culture diversity, and outperforms all baseline methods by a large margin.
SENov 14, 2023
AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered ApplicationsBhaktipriya Radharapu, Kevin Robinson, Lora Aroyo et al.
Adversarial testing of large language models (LLMs) is crucial for their safe and responsible deployment. We introduce a novel approach for automated generation of adversarial evaluation datasets to test the safety of LLM generations on new downstream applications. We call it AI-assisted Red-Teaming (AART) - an automated alternative to current manual red-teaming efforts. AART offers a data generation and augmentation pipeline of reusable and customizable recipes that reduce human effort significantly and enable integration of adversarial testing earlier in new product development. AART generates evaluation datasets with high diversity of content characteristics critical for effective adversarial testing (e.g. sensitive and harmful concepts, specific to a wide range of cultural and geographic regions and application scenarios). The data generation is steered by AI-assisted recipes to define, scope and prioritize diversity within the application context. This feeds into a structured LLM-generation process that scales up evaluation priorities. Compared to some state-of-the-art tools, AART shows promising results in terms of concept coverage and data quality.
CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic CapabilitiesGheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
LGDec 5, 2023
FRAPPE: A Group Fairness Framework for Post-Processing EverythingAlexandru Tifrea, Preethi Lahoti, Ben Packer et al.
Despite achieving promising fairness-error trade-offs, in-processing mitigation techniques for group fairness cannot be employed in numerous practical applications with limited computation resources or no access to the training pipeline of the prediction model. In these situations, post-processing is a viable alternative. However, current methods are tailored to specific problem settings and fairness definitions and hence, are not as broadly applicable as in-processing. In this work, we propose a framework that turns any regularized in-processing method into a post-processing approach. This procedure prescribes a way to obtain post-processing techniques for a much broader range of problem settings than the prior post-processing literature. We show theoretically and through extensive experiments that our framework preserves the good fairness-error trade-offs achieved with in-processing and can improve over the effectiveness of prior post-processing methods. Finally, we demonstrate several advantages of a modular mitigation strategy that disentangles the training of the prediction model from the fairness mitigation, including better performance on tasks with partial group labels.
CLJun 24, 2024
Automated Adversarial Discovery for Safety ClassifiersYash Kumar Lal, Preethi Lahoti, Aradhana Sinha et al.
Safety classifiers are critical in mitigating toxicity on online forums such as social media and in chatbots. Still, they continue to be vulnerable to emergent, and often innumerable, adversarial attacks. Traditional automated adversarial data generation methods, however, tend to produce attacks that are not diverse, but variations of previously observed harm types. We formalize the task of automated adversarial discovery for safety classifiers - to find new attacks along previously unseen harm dimensions that expose new weaknesses in the classifier. We measure progress on this task along two key axes (1) adversarial success: does the attack fool the classifier? and (2) dimensional diversity: does the attack represent a previously unseen harm type? Our evaluation of existing attack generation methods on the CivilComments toxicity task reveals their limitations: Word perturbation attacks fail to fool classifiers, while prompt-based LLM attacks have more adversarial success, but lack dimensional diversity. Even our best-performing prompt-based method finds new successful attacks on unseen harm dimensions of attacks only 5\% of the time. Automatically finding new harmful dimensions of attack is crucial and there is substantial headroom for future research on our new task.
LGJun 24, 2024
Inducing Group Fairness in Prompt-Based Language Model DecisionsJames Atwood, Nino Scherrer, Preethi Lahoti et al.
Classifiers are used throughout industry to enforce policies, ranging from the detection of toxic content to age-appropriate content filtering. While these classifiers serve important functions, it is also essential that they are built in ways that minimize unfair biases for users. One such fairness consideration is called group fairness, which desires that different sub-population of users receive equal treatment. This is a well-studied problem in the context of 'classical' classifiers. However, the emergence of prompt-based language model (LM) decision making has created new opportunities to solve text-based classification tasks, and the fairness properties of these new classifiers are not yet well understood. Further, the `remediation toolkit' is incomplete for LM-based decision makers and little is understood about how to improve decision maker group fairness while maintaining classifier performance. This work sets out to add more tools to that toolbox. We introduce adaptations of existing effective approaches from the classical classifier fairness to the prompt-based classifier space. We also devise simple methods that take advantage of the new structure of prompt-based decision makers and operate at the prompt level. We compare these approaches empirically on real data. Our results suggest that adaptations of approaches that are effective for classical classifiers remain effective in the LM-based classifier environment. However, there is room for further exploration of prompt-based remediation methods (and other remediation methods that take advantage of LM structure).
CLDec 19, 2023
Gemini: A Family of Highly Capable Multimodal ModelsGemini Team, Rohan Anil, Sebastian Borgeaud et al.
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
LGSep 9, 2021
Detecting and Mitigating Test-time Failure Risks via Model-agnostic Uncertainty LearningPreethi Lahoti, Krishna P. Gummadi, Gerhard Weikum
Reliably predicting potential failure risks of machine learning (ML) systems when deployed with production data is a crucial aspect of trustworthy AI. This paper introduces Risk Advisor, a novel post-hoc meta-learner for estimating failure risks and predictive uncertainties of any already-trained black-box classification model. In addition to providing a risk score, the Risk Advisor decomposes the uncertainty estimates into aleatoric and epistemic uncertainty components, thus giving informative insights into the sources of uncertainty inducing the failures. Consequently, Risk Advisor can distinguish between failures caused by data variability, data shifts and model limitations and advise on mitigation actions (e.g., collecting more data to counter data shift). Extensive experiments on various families of black-box classification models and on real-world and synthetic datasets covering common ML failure scenarios show that the Risk Advisor reliably predicts deployment-time failure risks in all the scenarios, and outperforms strong baselines.
LGMay 10, 2021
Accounting for Model Uncertainty in Algorithmic DiscriminationJunaid Ali, Preethi Lahoti, Krishna P. Gummadi
Traditional approaches to ensure group fairness in algorithmic decision making aim to equalize ``total'' error rates for different subgroups in the population. In contrast, we argue that the fairness approaches should instead focus only on equalizing errors arising due to model uncertainty (a.k.a epistemic uncertainty), caused due to lack of knowledge about the best model or due to lack of data. In other words, our proposal calls for ignoring the errors that occur due to uncertainty inherent in the data, i.e., aleatoric uncertainty. We draw a connection between predictive multiplicity and model uncertainty and argue that the techniques from predictive multiplicity could be used to identify errors made due to model uncertainty. We propose scalable convex proxies to come up with classifiers that exhibit predictive multiplicity and empirically show that our methods are comparable in performance and up to four orders of magnitude faster than the current state-of-the-art. We further propose methods to achieve our goal of equalizing group error rates arising due to model uncertainty in algorithmic decision making and demonstrate the effectiveness of these methods using synthetic and real-world datasets.
LGJun 23, 2020
Fairness without Demographics through Adversarially Reweighted LearningPreethi Lahoti, Alex Beutel, Jilin Chen et al.
Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fairness research. Therefore we ask: How can we train an ML model to improve fairness when we do not even know the protected group memberships? In this work we address this problem by proposing Adversarially Reweighted Learning (ARL). In particular, we hypothesize that non-protected features and task labels are valuable for identifying fairness issues, and can be used to co-train an adversarial reweighting approach for improving fairness. Our results show that {ARL} improves Rawlsian Max-Min fairness, with notable AUC improvements for worst-case protected groups in multiple datasets, outperforming state-of-the-art alternatives.
CYOct 22, 2019
An Empirical Study on Learning Fairness Metrics for COMPAS Data with Human SupervisionHanchen Wang, Nina Grgic-Hlaca, Preethi Lahoti et al.
The notion of individual fairness requires that similar people receive similar treatment. However, this is hard to achieve in practice since it is difficult to specify the appropriate similarity metric. In this work, we attempt to learn such similarity metric from human annotated data. We gather a new dataset of human judgments on a criminal recidivism prediction (COMPAS) task. By assuming the human supervision obeys the principle of individual fairness, we leverage prior work on metric learning, evaluate the performance of several metric learning methods on our dataset, and show that the learned metrics outperform the Euclidean and Precision metric under various criteria. We do not provide a way to directly learn a similarity metric satisfying the individual fairness, but to provide an empirical study on how to derive the similarity metric from human supervisors, then future work can use this as a tool to understand human supervision.
LGJul 2, 2019
Operationalizing Individual Fairness with Pairwise Fair RepresentationsPreethi Lahoti, Krishna P. Gummadi, Gerhard Weikum
We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation (PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including human judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable.
LGJun 4, 2018
iFair: Learning Individually Fair Data Representations for Algorithmic Decision MakingPreethi Lahoti, Krishna P. Gummadi, Gerhard Weikum
People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: giving adequate success rates to specifically protected groups. In contrast, the alternative paradigm of individual fairness has received relatively little attention, and this paper advances this less explored direction. The paper introduces a method for probabilistically mapping user records into a low-rank representation that reconciles individual fairness and the utility of classifiers and rankings in downstream applications. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on a variety of real-world datasets. Our experiments show substantial improvements over the best prior work for this setting.