Shion Guha

HC
h-index37
16papers
504citations
Novelty35%
AI Score51

16 Papers

LGNov 26, 2022
The Principles of Data-Centric AI (DCAI)

Mohammad Hossein Jarrahi, Ali Memariani, Shion Guha

Data is a crucial infrastructure to how artificial intelligence (AI) systems learn. However, these systems to date have been largely model-centric, putting a premium on the model at the expense of the data quality. Data quality issues beset the performance of AI systems, particularly in downstream deployments and in real-world applications. Data-centric AI (DCAI) as an emerging concept brings data, its quality and its dynamism to the forefront in considerations of AI systems through an iterative and systematic approach. As one of the first overviews, this article brings together data-centric perspectives and concepts to outline the foundations of DCAI. It specifically formulates six guiding principles for researchers and practitioners and gives direction for future advancement of DCAI.

CLMay 7
How do datasets, developers, and models affect biases in a low-resourced language?: The Case of the Bengali Language

Dipto Das, Shion Guha, Bryan Semaan

Sociotechnical systems, such as language technologies, frequently exhibit identity-based biases. These biases exacerbate the experiences of historically marginalized communities and remain understudied in low-resource contexts. While models and datasets specific to a language or with multilingual support are commonly recommended to address these biases, this paper empirically tests the effectiveness of such approaches in the context of gender, religion, and nationality-based identities in Bengali, a widely spoken but low-resourced language. We conducted an algorithmic audit of sentiment analysis models built on mBERT and BanglaBERT, which were fine-tuned using all Bengali sentiment analysis (BSA) datasets from Google Dataset Search. Our analyses showed that BSA models exhibit biases across different identity categories despite having similar semantic content and structure. We also examined the inconsistencies and uncertainties arising from combining pre-trained models and datasets created by individuals from diverse demographic backgrounds. We connected these findings to the broader discussions on epistemic injustice, AI alignment, and methodological decisions in algorithmic audits.

HCFeb 12, 2023
A Human-Centered Review of Algorithms in Decision-Making in Higher Education

Kelly McConvey, Shion Guha, Anastasia Kuzminykh

The use of algorithms for decision-making in higher education is steadily growing, promising cost-savings to institutions and personalized service for students but also raising ethical challenges around surveillance, fairness, and interpretation of data. To address the lack of systematic understanding of how these algorithms are currently designed, we reviewed an extensive corpus of papers proposing algorithms for decision-making in higher education. We categorized them based on input data, computational method, and target outcome, and then investigated the interrelations of these factors with the application of human-centered lenses: theoretical, participatory, or speculative design. We found that the models are trending towards deep learning, and increased use of student personal data and protected attributes, with the target scope expanding towards automated decisions. However, despite the associated decrease in interpretability and explainability, current development predominantly fails to incorporate human-centered lenses. We discuss the challenges with these trends and advocate for a human-centered approach.

HCApr 3
The Paradox of Prioritization in Public Sector Algorithms

Erina Seh-Young Moon, Matthew Tamura, Shion Guha

Public sector agencies perform the critical task of implementing the redistributive role of the State by acting as the leading provider of critical public services that many rely on. In recent years, public agencies have been increasingly adopting algorithmic prioritization tools to determine which individuals should be allocated scarce public resources. Prior work on these tools has largely focused on assessing and improving their fairness, accuracy, and validity. However, what remains understudied is how the structural design of prioritization itself shapes both the effectiveness of these tools and the experiences of those subject to them under realistic public sector conditions. In this study, we demonstrate the fallibility of adopting a prioritization approach in the public sector by showing how the underlying mechanisms of prioritization generate significant relative disparities between groups of intersectional identities as resources become increasingly scarce. We argue that despite prevailing arguments that prioritization of resources can lead to efficient allocation outcomes, prioritization can intensify perceptions of inequality for impacted individuals. We contend that efficiencies generated by algorithmic tools should not be conflated with the dominant rhetoric that efficiency necessarily entails "doing more with less" and we highlight the risks of overlooking resource constraints present in real-world implementation contexts.

AIApr 16
Bureaucratic Silences: What the Canadian AI Register Reveals, Omits, and Obscures

Dipto Das, Christelle Tessono, Syed Ishtiaque Ahmed et al.

In November 2025, the Government of Canada operationalized its commitment to transparency by releasing its first Federal AI Register. In this paper, we argue that such registers are not neutral mirrors of government activity, but active instruments of ontological design that configure the boundaries of accountability. We analyzed the Register's complete dataset of 409 systems using the Algorithmic Decision-Making Adapted for the Public Sector (ADMAPS) framework, combining quantitative mapping with deductive qualitative coding. Our findings reveal a sharp divergence between the rhetoric of "sovereign AI" and the reality of bureaucratic practice: while 86\% of systems are deployed internally for efficiency, the Register systematically obscures the human discretion, training, and uncertainty management required to operate them. By privileging technical descriptions over sociotechnical context, the Register constructs an ontology of AI as "reliable tooling" rather than "contestable decision-making." We conclude that without a shift in design, such transparency artifacts risk automating accountability into a performative compliance exercise, offering visibility without contestability.

CLJun 23, 2025Code
Human-Aligned Faithfulness in Toxicity Explanations of LLMs

Ramaravind K. Mothilal, Joanna Roy, Syed Ishtiaque Ahmed et al. · utoronto

The discourse around toxicity and LLMs in NLP largely revolves around detection tasks. This work shifts the focus to evaluating LLMs' reasoning about toxicity -- from their explanations that justify a stance -- to enhance their trustworthiness in downstream tasks. Despite extensive research on explainability, it is not straightforward to adopt existing methods to evaluate free-form toxicity explanation due to their over-reliance on input text perturbations, among other challenges. To account for these, we propose a novel, theoretically-grounded multi-dimensional criterion, Human-Aligned Faithfulness (HAF), that measures the extent to which LLMs' free-form toxicity explanations align with those of a rational human under ideal conditions. We develop six metrics, based on uncertainty quantification, to comprehensively evaluate HAF of LLMs' toxicity explanations with no human involvement, and highlight how "non-ideal" the explanations are. We conduct several experiments on three Llama models (of size up to 70B) and an 8B Ministral model on five diverse toxicity datasets. Our results show that while LLMs generate plausible explanations to simple prompts, their reasoning about toxicity breaks down when prompted about the nuanced relations between the complete set of reasons, the individual reasons, and their toxicity stances, resulting in inconsistent and irrelevant responses. We open-source our code at https://github.com/uofthcdslab/HAF and LLM-generated explanations at https://huggingface.co/collections/uofthcdslab/haf.

HCMar 20, 2024
"This is not a data problem": Algorithms and Power in Public Higher Education in Canada

Kelly McConvey, Shion Guha

Algorithmic decision-making is increasingly being adopted across public higher education. The expansion of data-driven practices by post-secondary institutions has occurred in parallel with the adoption of New Public Management approaches by neoliberal administrations. In this study, we conduct a qualitative analysis of an in-depth ethnographic case study of data and algorithms in use at a public college in Ontario, Canada. We identify the data, algorithms, and outcomes in use at the college. We assess how the college's processes and relationships support those outcomes and the different stakeholders' perceptions of the college's data-driven systems. In addition, we find that the growing reliance on algorithmic decisions leads to increased student surveillance, exacerbation of existing inequities, and the automation of the faculty-student relationship. Finally, we identify a cycle of increased institutional power perpetuated by algorithmic decision-making, and driven by a push towards financial sustainability.

CYApr 21
Fairness Audits of Institutional Risk Models in Deployed ML Pipelines

Kelly McConvey, Dipto Das, Maya Ghai et al.

Fairness audits of institutional risk models are critical for understanding how deployed machine learning pipelines allocate resources. Drawing on multi-year collaboration with Centennial College, where our prior ethnographic work introduced the ASP-HEI Cycle, we present a replica-based audit of a deployed Early Warning System (EWS), replicating its model using institutional training data and design specifications. We evaluate disparities by gender, age, and residency status across the full pipeline (training data, model predictions, and post-processing) using standard fairness metrics. Our audit reveals systematic misallocation: younger, male, and international students are disproportionately flagged for support, even when many ultimately succeed, while older and female students with comparable dropout risk are under-identified. Post-processing amplifies these disparities by collapsing heterogeneous probabilities into percentile-based risk tiers. This work provides a replicable methodology for auditing institutional ML systems and shows how disparities emerge and compound across stages, highlighting the importance of evaluating construct validity alongside statistical fairness. It contributes one empirical thread to a broader program investigating algorithms, student data, and power in higher education.

LGJul 7, 2025
Bridging Prediction and Intervention Problems in Social Systems

Lydia T. Liu, Inioluwa Deborah Raji, Angela Zhou et al.

Many automated decision systems (ADS) are designed to solve prediction problems -- where the goal is to learn patterns from a sample of the population and apply them to individuals from the same population. In reality, these prediction systems operationalize holistic policy interventions in deployment. Once deployed, ADS can shape impacted population outcomes through an effective policy change in how decision-makers operate, while also being defined by past and present interactions between stakeholders and the limitations of existing organizational, as well as societal, infrastructure and context. In this work, we consider the ways in which we must shift from a prediction-focused paradigm to an interventionist paradigm when considering the impact of ADS within social systems. We argue this requires a new default problem setup for ADS beyond prediction, to instead consider predictions as decision support, final decisions, and outcomes. We highlight how this perspective unifies modern statistical frameworks and other tools to study the design, implementation, and evaluation of ADS systems, and point to the research directions necessary to operationalize this paradigm shift. Using these tools, we characterize the limitations of focusing on isolated prediction tasks, and lay the foundation for a more intervention-oriented approach to developing and deploying ADS.

HCFeb 18, 2025
Talking About the Assumption in the Room

Ramaravind Kommiya Mothilal, Faisal M. Lalani, Syed Ishtiaque Ahmed et al. · utoronto

The reference to assumptions in how practitioners use or interact with machine learning (ML) systems is ubiquitous in HCI and responsible ML discourse. However, what remains unclear from prior works is the conceptualization of assumptions and how practitioners identify and handle assumptions throughout their workflows. This leads to confusion about what assumptions are and what needs to be done with them. We use the concept of an argument from Informal Logic, a branch of Philosophy, to offer a new perspective to understand and explicate the confusions surrounding assumptions. Through semi-structured interviews with 22 ML practitioners, we find what contributes most to these confusions is how independently assumptions are constructed, how reactively and reflectively they are handled, and how nebulously they are recorded. Our study brings the peripheral discussion of assumptions in ML to the center and presents recommendations for practitioners to better think about and work with assumptions.

CYFeb 26, 2024
Beyond Predictive Algorithms in Child Welfare

Erina Seh-Young Moon, Devansh Saxena, Tegan Maharaj et al. · mila

Caseworkers in the child welfare (CW) sector use predictive decision-making algorithms built on risk assessment (RA) data to guide and support CW decisions. Researchers have highlighted that RAs can contain biased signals which flatten CW case complexities and that the algorithms may benefit from incorporating contextually rich case narratives, i.e. - casenotes written by caseworkers. To investigate this hypothesized improvement, we quantitatively deconstructed two commonly used RAs from a United States CW agency. We trained classifier models to compare the predictive validity of RAs with and without casenote narratives and applied computational text analysis on casenotes to highlight topics uncovered in the casenotes. Our study finds that common risk metrics used to assess families and build CWS predictive risk models (PRMs) are unable to predict discharge outcomes for children who are not reunified with their birth parent(s). We also find that although casenotes cannot predict discharge outcomes, they contain contextual case signals. Given the lack of predictive validity of RA scores and casenotes, we propose moving beyond quantitative risk assessments for public sector algorithms and towards using contextual sources of information such as narratives to study public sociotechnical systems.

CLJan 19, 2024
The "Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based Biases

Dipto Das, Shion Guha, Jed Brubaker et al.

While colonization has sociohistorically impacted people's identities across various dimensions, those colonial values and biases continue to be perpetuated by sociotechnical systems. One category of sociotechnical systems--sentiment analysis tools--can also perpetuate colonial values and bias, yet less attention has been paid to how such tools may be complicit in perpetuating coloniality, although they are often used to guide various practices (e.g., content moderation). In this paper, we explore potential bias in sentiment analysis tools in the context of Bengali communities that have experienced and continue to experience the impacts of colonialism. Drawing on identity categories most impacted by colonialism amongst local Bengali communities, we focused our analytic attention on gender, religion, and nationality. We conducted an algorithmic audit of all sentiment analysis tools for Bengali, available on the Python package index (PyPI) and GitHub. Despite similar semantic content and structure, our analyses showed that in addition to inconsistencies in output from different tools, Bengali sentiment analysis tools exhibit bias between different identity categories and respond differently to different ways of identity expression. Connecting our findings with colonially shaped sociocultural structures of Bengali communities, we discuss the implications of downstream bias of sentiment analysis tools.

HCJul 7, 2021
A Framework of High-Stakes Algorithmic Decision-Making for the Public Sector Developed through a Case Study of Child-Welfare

Devansh Saxena, Karla Badillo-Urquiola, Pamela Wisniewski et al.

Algorithms have permeated throughout civil government and society, where they are being used to make high-stakes decisions about human lives. In this paper, we first develop a cohesive framework of algorithmic decision-making adapted for the public sector (ADMAPS) that reflects the complex socio-technical interactions between \textit{human discretion}, \textit{bureaucratic processes}, and \textit{algorithmic decision-making} by synthesizing disparate bodies of work in the fields of Human-Computer Interaction (HCI), Science and Technology Studies (STS), and Public Administration (PA). We then applied the ADMAPS framework to conduct a qualitative analysis of an in-depth, eight-month ethnographic case study of the algorithms in daily use within a child-welfare agency that serves approximately 900 families and 1300 children in the mid-western United States. Overall, we found there is a need to focus on strength-based algorithmic outcomes centered in social ecological frameworks. In addition, algorithmic systems need to support existing bureaucratic processes and augment human discretion, rather than replace it. Finally, collective buy-in in algorithmic systems requires trust in the target outcomes at both the practitioner and bureaucratic levels. As a result of our study, we propose guidelines for the design of high-stakes algorithmic decision-making tools in the child-welfare system, and more generally, in the public sector. We empirically validate the theoretically derived ADMAPS framework to demonstrate how it can be useful for systematically making pragmatic decisions about the design of algorithms for the public sector.

HCFeb 4, 2021
"Facebook Promotes More Harassment": Social Media Ecosystem, Skill and Marginalized Hijra Identity in Bangladesh

Fayika Farhat Nova, Michael Ann Devito, Pratyasha Saha et al.

Social interaction across multiple online platforms is a challenge for gender and sexual minorities (GSM) due to the stigmatization they face, which increases the complexity of their self-presentation decisions. These online interactions and identity disclosures can be more complicated for GSM in non-Western contexts due to consequentially different audiences and perceived affordances by the users, and limited baseline understanding of the conflation of these two with local norms and the opportunities they practically represent. Using focus group discussions and semi-structured interviews, we engaged with 61 \textit{Hijra} individuals from Bangladesh, a severely stigmatized GSM from south Asia, to understand their overall online participation and disclosure behaviors through the lens of personal social media ecosystems. We find that along with platform audiences, affordances, and norms, participant skill/knowledge, and cultural influences also impact navigation through multiple platforms, resulting in differential benefits from privacy features. This impacts how Hijra perceive online spaces, and shape their self-presentation and disclosure behaviors over time. Content Warning: This paper discusses graphic contents (e.g. rape and sexual harassment) related to Hijra.

HCApr 9, 2020
Methods for Generating Typologies of Non/use

Devansh Saxena, Patrick Skeba, Shion Guha et al.

Prior studies of technology non-use demonstrate the need for approaches that go beyond a simple binary distinction between users and non-users. This paper proposes a set of two different methods by which researchers can identify types of non/use$^{1}$ relevant to the particular sociotechnical settings they are studying. These methods are demonstrated by applying them to survey data about Facebook non/use. The results demonstrate that the different methods proposed here identify fairly comparable types of non/use. They also illustrate how the two methods make different trade offs between the granularity of the resulting typology and the total sample size. The paper also demonstrates how the different typologies resulting from these methods can be used in predictive modeling, allowing for the two methods to corroborate or disconfirm results from one another. The discussion considers implications and applications of these methods, both for research on technology non/use and for studying social computing more broadly.

CYMar 7, 2020
A Human-Centered Review of the Algorithms used within the U.S. Child Welfare System

Devansh Saxena, Karla Badillo-Urquiola, Pamela J. Wisniewski et al.

The U.S. Child Welfare System (CWS) is charged with improving outcomes for foster youth; yet, they are overburdened and underfunded. To overcome this limitation, several states have turned towards algorithmic decision-making systems to reduce costs and determine better processes for improving CWS outcomes. Using a human-centered algorithmic design approach, we synthesize 50 peer-reviewed publications on computational systems used in CWS to assess how they were being developed, common characteristics of predictors used, as well as the target outcomes. We found that most of the literature has focused on risk assessment models but does not consider theoretical approaches (e.g., child-foster parent matching) nor the perspectives of caseworkers (e.g., case notes). Therefore, future algorithms should strive to be context-aware and theoretically robust by incorporating salient factors identified by past research. We provide the HCI community with research avenues for developing human-centered algorithms that redirect attention towards more equitable outcomes for CWS.