CVJul 22, 2024Code
Breaking the Global North Stereotype: A Global South-centric Benchmark Dataset for Auditing and Mitigating Biases in Facial Recognition SystemsSiddharth D Jaiswal, Animesh Ganai, Abhisek Dash et al.
Facial Recognition Systems (FRSs) are being developed and deployed globally at unprecedented rates. Most platforms are designed in a limited set of countries but deployed in worldwide, without adequate checkpoints. This is especially problematic for Global South countries which lack strong legislation to safeguard persons facing disparate performance of these systems. A combination of unavailability of datasets, lack of understanding of FRS functionality and low-resource bias mitigation measures accentuate the problem. In this work, we propose a new face dataset composed of 6,579 unique male and female sportspersons from eight countries around the world. More than 50% of the dataset comprises individuals from the Global South countries and is demographically diverse. To aid adversarial audits and robust model training, each image has four adversarial variants, totaling over 40,000 images. We also benchmark five popular FRSs, both commercial and open-source, for the task of gender prediction (and country prediction for one of the open-source models as an example of red-teaming). Experiments on industrial FRSs reveal accuracies ranging from 98.2%--38.1%, with a large disparity between males and females in the Global South (max difference of 38.5%). Biases are also observed in all FRSs between females of the Global North and South (max difference of ~50%). Grad-CAM analysis identifies the nose, forehead and mouth as the regions of interest on one of the open-source FRSs. Utilizing this insight, we design simple, low-resource bias mitigation solutions using few-shot and novel contrastive learning techniques significantly improving the accuracy with disparity between males and females reducing from 50% to 1.5% in one of the settings. In the red-teaming experiment with the open-source Deepface model, contrastive learning proves more effective than simple fine-tuning.
CYMar 20
Setting the Course, but Forgetting to Steer: Analyzing Compliance with GDPR's Right of Access to Data by Instagram, TikTok, and YouTubeSai Keerthana Karnam, Abhisek Dash, Antariksh Das et al.
The GDPR's Right of Access aims to empower users with control over their personal data via Data Download Packages (DDPs). However, their effectiveness is often compromised by inconsistent platform implementations, questionable data reliability, and poor user comprehensibility. This paper conducts a comprehensive audit of DDPs from three social media platforms (TikTok, Instagram, and YouTube) to systematically assess these critical drawbacks. Despite offering similar services, we find that these platforms demonstrate significant inconsistencies in implementing the Right of Access, evident in varying levels of shared data. Critically, the failure to disclose processing purposes, retention periods, and other third-party data recipients serves as a further indicator of non-compliance. Our reliability evaluations, using bots and user-donated data, reveal that while TikTok's DDPs offer more consistent and complete data, others exhibit notable shortcomings. Similarly, our assessment of comprehensibility, based on surveys with 400 participants, indicates that current DDPs substantially fall short of GDPR's standards. To improve the comprehensibility, we propose and demonstrate a two-layered approach by: (1)~enhancing the data representation itself using stakeholder interpretations; and (2)~incorporating a user-friendly extension (\textit{Know Your Data}) for intuitive data visualization where users can control the level of transparency they prefer. Our findings underscore the need for clearer and non-conflicting regulatory guidance, stricter enforcement, and platform commitment to realize the goal of GDPR's Right of Access.
CRApr 27
On the Centralization of Governance Power in Decentralized Autonomous OrganizationsVabuk Pahari, Balakrishnan Chandrasekaran, Johnnatan Messias et al.
A decentralized autonomous organization (DAO) is a governing entity that empowers its stakeholders (i.e., users who hold one or more of its tokens) to manage blockchain-based protocols (i.e., smart contracts) collaboratively. The governance of a DAO is explicitly encoded in the DAO's governance contract, which defines how stakeholders participate in governance and how much influence (or voting power) they have in any decision. While decentralization and autonomy are the fundamental tenets of a DAO's design, empirical evidence suggests that in practice governance is often highly centralized. In this work, we study the designs and implementations of 48 public and actively used DAOs, with substantially large capital, deployed on Ethereum. We identify how three key governance mechanisms--token registration, staking, and delegation--originally introduced to improve security or participation, contribute to the concentration of voting power. Unlike prior work on centralization of voting power in specific DAOs, our findings reveal that these governance mechanisms of DAOs themselves systematically reinforce centralization. By elucidating the relationship between governance design and voting centralization, this work advances the understanding of DAO governance structures and highlights the inherent trade-offs between decentralization, security, and usability of DAOs.
AISep 30, 2025
Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail PredictionSagnik Basu, Shubham Prakash, Ashish Maruti Barge et al.
Large language models (LLMs) have been extensively used for legal judgment prediction tasks based on case reports and crime history. However, with a surge in the availability of large vision language models (VLMs), legal judgment prediction systems can now be made to leverage the images of the criminals in addition to the textual case reports/crime history. Applications built in this way could lead to inadvertent consequences and be used with malicious intent. In this work, we run an audit to investigate the efficiency of standalone VLMs in the bail decision prediction task. We observe that the performance is poor across multiple intersectional groups and models \textit{wrongly deny bail to deserving individuals with very high confidence}. We design different intervention algorithms by first including legal precedents through a RAG pipeline and then fine-tuning the VLMs using innovative schemes. We demonstrate that these interventions substantially improve the performance of bail prediction. Our work paves the way for the design of smarter interventions on VLMs in the future, before they can be deployed for real-world legal judgment prediction.
HCFeb 8, 2022
Alexa, in you, I trust! Fairness and Interpretability Issues in E-commerce Search through Smart SpeakersAbhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh et al.
In traditional (desktop) e-commerce search, a customer issues a specific query and the system returns a ranked list of products in order of relevance to the query. An increasingly popular alternative in e-commerce search is to issue a voice-query to a smart speaker (e.g., Amazon Echo) powered by a voice assistant (VA, e.g., Alexa). In this situation, the VA usually spells out the details of only one product, an explanation citing the reason for its selection, and a default action of adding the product to the customer's cart. This reduced autonomy of the customer in the choice of a product during voice-search makes it necessary for a VA to be far more responsible and trustworthy in its explanation and default action. In this paper, we ask whether the explanation presented for a product selection by the Alexa VA installed on an Amazon Echo device is consistent with human understanding as well as with the observations on other traditional mediums (e.g., desktop ecommerce search). Through a user survey, we find that in 81% cases the interpretation of 'a top result' by the users is different from that of Alexa. While investigating for the fairness of the default action, we observe that over a set of as many as 1000 queries, in nearly 68% cases, there exist one or more products which are more relevant (as per Amazon's own desktop search results) than the product chosen by Alexa. Finally, we conducted a survey over 30 queries for which the Alexa-selected product was different from the top desktop search result, and observed that in nearly 73% cases, the participants preferred the top desktop search result as opposed to the product chosen by Alexa. Our results raise several concerns and necessitates more discussions around the related fairness and interpretability issues of VAs for e-commerce search.
CVNov 17, 2021
Two-Face: Adversarial Audit of Commercial Face Recognition SystemsSiddharth D Jaiswal, Karthikeya Duggirala, Abhisek Dash et al.
Computer vision applications like automated face detection are used for a variety of purposes ranging from unlocking smart devices to tracking potential persons of interest for surveillance. Audits of these applications have revealed that they tend to be biased against minority groups which result in unfair and concerning societal and political outcomes. Despite multiple studies over time, these biases have not been mitigated completely and have in fact increased for certain tasks like age prediction. While such systems are audited over benchmark datasets, it becomes necessary to evaluate their robustness for adversarial inputs. In this work, we perform an extensive adversarial audit on multiple systems and datasets, making a number of concerning observations - there has been a drop in accuracy for some tasks on CELEBSET dataset since a previous audit. While there still exists a bias in accuracy against individuals from minority groups for multiple datasets, a more worrying observation is that these biases tend to get exorbitantly pronounced with adversarial inputs toward the minority group. We conclude with a discussion on the broader societal impacts in light of these observations and a few suggestions on how to collectively deal with this issue.
CYJan 30, 2021
When the Umpire is also a Player: Bias in Private Label Product Recommendations on E-commerce MarketplacesAbhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh et al.
Algorithmic recommendations mediate interactions between millions of customers and products (in turn, their producers and sellers) on large e-commerce marketplaces like Amazon. In recent years, the producers and sellers have raised concerns about the fairness of black-box recommendation algorithms deployed on these marketplaces. Many complaints are centered around marketplaces biasing the algorithms to preferentially favor their own `private label' products over competitors. These concerns are exacerbated as marketplaces increasingly de-emphasize or replace `organic' recommendations with ad-driven `sponsored' recommendations, which include their own private labels. While these concerns have been covered in popular press and have spawned regulatory investigations, to our knowledge, there has not been any public audit of these marketplace algorithms. In this study, we bridge this gap by performing an end-to-end systematic audit of related item recommendations on Amazon. We propose a network-centric framework to quantify and compare the biases across organic and sponsored related item recommendations. Along a number of our proposed bias measures, we find that the sponsored recommendations are significantly more biased toward Amazon private label products compared to organic recommendations. While our findings are primarily interesting to producers and sellers on Amazon, our proposed bias measures are generally useful for measuring link formation bias in any social or content networks.
IRJan 29, 2021
Fairness for Whom? Understanding the Reader's Perception of Fairness in Text SummarizationAnurag Shandilya, Abhisek Dash, Abhijnan Chakraborty et al.
With the surge in user-generated textual information, there has been a recent increase in the use of summarization algorithms for providing an overview of the extensive content. Traditional metrics for evaluation of these algorithms (e.g. ROUGE scores) rely on matching algorithmic summaries to human-generated ones. However, it has been shown that when the textual contents are heterogeneous, e.g., when they come from different socially salient groups, most existing summarization algorithms represent the social groups very differently compared to their distribution in the original data. To mitigate such adverse impacts, some fairness-preserving summarization algorithms have also been proposed. All of these studies have considered normative notions of fairness from the perspective of writers of the contents, neglecting the readers' perceptions of the underlying fairness notions. To bridge this gap, in this work, we study the interplay between the fairness notions and how readers perceive them in textual summaries. Through our experiments, we show that reader's perception of fairness is often context-sensitive. Moreover, standard ROUGE evaluation metrics are unable to quantify the perceived (un)fairness of the summaries. To this end, we propose a human-in-the-loop metric and an automated graph-based methodology to quantify the perceived bias in textual summaries. We demonstrate their utility by quantifying the (un)fairness of several summaries of heterogeneous socio-political microblog datasets.
IRFeb 7, 2019
A Network-centric Framework for Auditing Recommendation SystemsAbhisek Dash, Animesh Mukherjee, Saptarshi Ghosh
To improve the experience of consumers, all social media, commerce and entertainment sites deploy Recommendation Systems (RSs) that aim to help users locate interesting content. These RSs are black-boxes - the way a chunk of information is filtered out and served to a user from a large information base is mostly opaque. No one except the parent company generally has access to the entire information required for auditing these systems - neither the details of the algorithm nor the user-item interactions are ever made publicly available for third-party auditors. Hence auditing RSs remains an important challenge, especially with the recent concerns about how RSs are affecting the views of the society at large with new technical jargons like "echo chambers", "confirmation biases", "filter bubbles" etc. in place. Many prior works have evaluated different properties of RSs such as diversity, novelty, etc. However, most of these have focused on evaluating static snapshots of RSs. Today, auditors are not only interested in these static evaluations on a snapshot of the system, but also interested in how these systems are affecting the society in course of time. In this work, we propose a novel network-centric framework which is not only able to quantify various static properties of RSs, but also is able to quantify dynamic properties such as how likely RSs are to lead to polarization or segregation of information among their users. We apply the framework to several popular movie RSs to demonstrate its utility.
IROct 22, 2018
Summarizing User-generated Textual Content: Motivation and Methods for Fairness in Algorithmic SummariesAbhisek Dash, Anurag Shandilya, Arindam Biswas et al.
As the amount of user-generated textual content grows rapidly, text summarization algorithms are increasingly being used to provide users a quick overview of the information content. Traditionally, summarization algorithms have been evaluated only based on how well they match human-written summaries (e.g. as measured by ROUGE scores). In this work, we propose to evaluate summarization algorithms from a completely new perspective that is important when the user-generated data to be summarized comes from different socially salient user groups, e.g. men or women, Caucasians or African-Americans, or different political groups (Republicans or Democrats). In such cases, we check whether the generated summaries fairly represent these different social groups. Specifically, considering that an extractive summarization algorithm selects a subset of the textual units (e.g. microblogs) in the original data for inclusion in the summary, we investigate whether this selection is fair or not. Our experiments over real-world microblog datasets show that existing summarization algorithms often represent the socially salient user-groups very differently compared to their distributions in the original data. More importantly, some groups are frequently under-represented in the generated summaries, and hence get far less exposure than what they would have obtained in the original data. To reduce such adverse impacts, we propose novel fairness-preserving summarization algorithms which produce high-quality summaries while ensuring fairness among various groups. To our knowledge, this is the first attempt to produce fair text summarization, and is likely to open up an interesting research direction.
HCOct 25, 2016
Image Clustering without Ground TruthAbhisek Dash, Sujoy Chatterjee, Tripti Prasad et al.
Cluster analysis has become one of the most exercised research areas over the past few decades in computer science. As a consequence, numerous clustering algorithms have already been developed to find appropriate partitions of a set of objects. Given multiple such clustering solutions, it is a challenging task to obtain an ensemble of these solutions. This becomes more challenging when the ground truth about the number of clusters is unavailable. In this paper, we introduce a crowd-powered model to collect solutions of image clustering from the general crowd and pose it as a clustering ensemble problem with variable number of clusters. The varying number of clusters basically reflects the crowd workers' perspective toward a particular set of objects. We allow a set of crowd workers to independently cluster the images as per their perceptions. We address the problem by finding out centroid of the clusters using an appropriate distance measure and prioritize the likelihood of similarity of the individual cluster sets. The effectiveness of the proposed method is demonstrated by applying it on multiple artificial datasets obtained from crowd.