Dipto Das

CY
6papers
37citations
Novelty29%
AI Score46

6 Papers

41.3CLMay 7
How do datasets, developers, and models affect biases in a low-resourced language?: The Case of the Bengali Language

Dipto Das, Shion Guha, Bryan Semaan

Sociotechnical systems, such as language technologies, frequently exhibit identity-based biases. These biases exacerbate the experiences of historically marginalized communities and remain understudied in low-resource contexts. While models and datasets specific to a language or with multilingual support are commonly recommended to address these biases, this paper empirically tests the effectiveness of such approaches in the context of gender, religion, and nationality-based identities in Bengali, a widely spoken but low-resourced language. We conducted an algorithmic audit of sentiment analysis models built on mBERT and BanglaBERT, which were fine-tuned using all Bengali sentiment analysis (BSA) datasets from Google Dataset Search. Our analyses showed that BSA models exhibit biases across different identity categories despite having similar semantic content and structure. We also examined the inconsistencies and uncertainties arising from combining pre-trained models and datasets created by individuals from diverse demographic backgrounds. We connected these findings to the broader discussions on epistemic injustice, AI alignment, and methodological decisions in algorithmic audits.

48.7CYMar 22
Tracing Users' Privacy Concerns Across the Lifecycle of a Romantic AI Companion

Kazi Ababil Azam, Imtiaz Karim, Dipto Das

Romantic AI chatbots have quickly attracted users, but their emotional use raises concerns about privacy and safety. As people turn to these systems for intimacy, comfort, and emotionally significant interaction, they often disclose highly sensitive information. Yet the privacy implications of such disclosure remain poorly understood in platforms shaped by persistence, intimacy, and opaque data practices. In this paper, we examine public Reddit discussions about privacy in romantic AI chatbot ecosystems through a lifecycle lens. Analyzing 2,909 posts from 79 subreddits collected over one year, we identify four recurring patterns: disproportionate entry requirements, intensified sensitivity in intimate use, interpretive uncertainty and perceived surveillance, and irreversibility, persistence, and user burden. We show that privacy in romantic AI is best understood as an evolving socio-technical governance problem spanning access, disclosure, interpretation, retention, and exit. These findings highlight the need for privacy and safety governance in romantic AI that is staged across the lifecycle of use, supports meaningful reversibility, and accounts for the emotional vulnerability of intimate human-AI interaction.

65.6CYApr 28
Beyond Categories of Caste: Examining Caste Bias and Morality in Text-to-Image AI Models

Divyanshu Kumar Singh, Dipto Das, Deepika Rama Subramanian et al.

Text-to-Image (T2I) models have shown promising utility across various domains. However, such models are also amplifying harmful societal biases in their outputs. In the context of South Asia, recent work has shown caste biases and stereotypes are being perpetuated through Generative AI (GenAI) systems. While this research offers extremely relevant insight into invisibilized narratives of caste discrimination through the GenAI system, they often treat caste as an identity category. Therefore, in this work we shift our ontology to focus on the relational aspect of caste. This enables us to develop a more nuanced understanding of the mechanics of caste discrimination by and through T2I models. Combining an algorithmic audit with critical discourse analysis, we draw on a conceptual frame challenging Brahminical Normativity to show how caste biases are perpetuated beyond the simple binaries of upper vs lower-caste categories. Our contributions are two-fold. Beyond challenging the categorical understanding of caste as a category, we propose an anti-caste approach to tackle the issue of caste bias and fairness in AI systems.

20.9AIApr 16
Bureaucratic Silences: What the Canadian AI Register Reveals, Omits, and Obscures

Dipto Das, Christelle Tessono, Syed Ishtiaque Ahmed et al.

In November 2025, the Government of Canada operationalized its commitment to transparency by releasing its first Federal AI Register. In this paper, we argue that such registers are not neutral mirrors of government activity, but active instruments of ontological design that configure the boundaries of accountability. We analyzed the Register's complete dataset of 409 systems using the Algorithmic Decision-Making Adapted for the Public Sector (ADMAPS) framework, combining quantitative mapping with deductive qualitative coding. Our findings reveal a sharp divergence between the rhetoric of "sovereign AI" and the reality of bureaucratic practice: while 86\% of systems are deployed internally for efficiency, the Register systematically obscures the human discretion, training, and uncertainty management required to operate them. By privileging technical descriptions over sociotechnical context, the Register constructs an ontology of AI as "reliable tooling" rather than "contestable decision-making." We conclude that without a shift in design, such transparency artifacts risk automating accountability into a performative compliance exercise, offering visibility without contestability.

48.1CYApr 21
Fairness Audits of Institutional Risk Models in Deployed ML Pipelines

Kelly McConvey, Dipto Das, Maya Ghai et al.

Fairness audits of institutional risk models are critical for understanding how deployed machine learning pipelines allocate resources. Drawing on multi-year collaboration with Centennial College, where our prior ethnographic work introduced the ASP-HEI Cycle, we present a replica-based audit of a deployed Early Warning System (EWS), replicating its model using institutional training data and design specifications. We evaluate disparities by gender, age, and residency status across the full pipeline (training data, model predictions, and post-processing) using standard fairness metrics. Our audit reveals systematic misallocation: younger, male, and international students are disproportionately flagged for support, even when many ultimately succeed, while older and female students with comparable dropout risk are under-identified. Post-processing amplifies these disparities by collapsing heterogeneous probabilities into percentile-based risk tiers. This work provides a replicable methodology for auditing institutional ML systems and shows how disparities emerge and compound across stages, highlighting the importance of evaluating construct validity alongside statistical fairness. It contributes one empirical thread to a broader program investigating algorithms, student data, and power in higher education.

CLJan 19, 2024
The "Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based Biases

Dipto Das, Shion Guha, Jed Brubaker et al.

While colonization has sociohistorically impacted people's identities across various dimensions, those colonial values and biases continue to be perpetuated by sociotechnical systems. One category of sociotechnical systems--sentiment analysis tools--can also perpetuate colonial values and bias, yet less attention has been paid to how such tools may be complicit in perpetuating coloniality, although they are often used to guide various practices (e.g., content moderation). In this paper, we explore potential bias in sentiment analysis tools in the context of Bengali communities that have experienced and continue to experience the impacts of colonialism. Drawing on identity categories most impacted by colonialism amongst local Bengali communities, we focused our analytic attention on gender, religion, and nationality. We conducted an algorithmic audit of all sentiment analysis tools for Bengali, available on the Python package index (PyPI) and GitHub. Despite similar semantic content and structure, our analyses showed that in addition to inconsistencies in output from different tools, Bengali sentiment analysis tools exhibit bias between different identity categories and respond differently to different ways of identity expression. Connecting our findings with colonially shaped sociocultural structures of Bengali communities, we discuss the implications of downstream bias of sentiment analysis tools.