Nupoor Gandhi

h-index3

6papers

700citations

Novelty41%

AI Score42

Ranked #58,857 of 194,257 authors (top 30%)#11,503 in CL (top 37%)

6 Papers

1.9CLOct 14, 2022Code

Mention Annotations Alone Enable Efficient Domain Adaptation for Coreference Resolution

Nupoor Gandhi, Anjalie Field, Emma Strubell

Although recent neural models for coreference resolution have led to substantial improvements on benchmark datasets, transferring these models to new target domains containing out-of-vocabulary spans and requiring differing annotation schemes remains challenging. Typical approaches involve continued training on annotated target-domain data, but obtaining annotations is costly and time-consuming. We show that annotating mentions alone is nearly twice as fast as annotating full coreference chains. Accordingly, we propose a method for efficiently adapting coreference models, which includes a high-precision mention detection objective and requires annotating only mentions in the target domain. Extensive evaluation across three English coreference datasets: CoNLL-2012 (news/conversation), i2b2/VA (medical notes), and previously unstudied child welfare notes, reveals that our approach facilitates annotation-efficient transfer and results in a 7-14% improvement in average F1 without increasing annotator time.

9.6CLJul 9, 2025

SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains

Krithika Ramesh, Daniel Smolyak, Zihao Zhao et al.

We present SynthTextEval, a toolkit for conducting comprehensive evaluations of synthetic text. The fluency of large language model (LLM) outputs has made synthetic text potentially viable for numerous applications, such as reducing the risks of privacy violations in the development and deployment of AI systems in high-stakes domains. Realizing this potential, however, requires principled consistent evaluations of synthetic data across multiple dimensions: its utility in downstream systems, the fairness of these systems, the risk of privacy leakage, general distributional differences from the source text, and qualitative feedback from domain experts. SynthTextEval allows users to conduct evaluations along all of these dimensions over synthetic data that they upload or generate using the toolkit's generation module. While our toolkit can be run over any data, we highlight its functionality and effectiveness over datasets from two high-stakes domains: healthcare and law. By consolidating and standardizing evaluation metrics, we aim to improve the viability of synthetic text, and in-turn, privacy-preservation in AI development.

8.3CLApr 16, 2025

Beyond Text: Characterizing Domain Expert Needs in Document Research

Sireesh Gururaja, Nupoor Gandhi, Jeremiah Milbauer et al. · cmu

Working with documents is a key part of almost any knowledge work, from contextualizing research in a literature review to reviewing legal precedent. Recently, as their capabilities have expanded, primarily text-based NLP systems have often been billed as able to assist or even automate this kind of work. But to what extent are these systems able to model these tasks as experts conceptualize and perform them now? In this study, we interview sixteen domain experts across two domains to understand their processes of document research, and compare it to the current state of NLP systems. We find that our participants processes are idiosyncratic, iterative, and rely extensively on the social context of a document in addition its content; existing approaches in NLP and adjacent fields that explicitly center the document as an object, rather than as merely a container for text, tend to better reflect our participants' priorities, though they are often less accessible outside their research communities. We call on the NLP community to more carefully consider the role of the document in building useful tools that are accessible, personalizable, iterative, and socially aware.

2.9CLMay 30, 2023

Examining risks of racial biases in NLP tools for child protective services

Anjalie Field, Amanda Coston, Nupoor Gandhi et al.

Although much literature has established the presence of demographic bias in natural language processing (NLP) models, most work relies on curated bias metrics that may not be reflective of real-world applications. At the same time, practitioners are increasingly using algorithmic tools in high-stakes settings, with particular recent interest in NLP. In this work, we focus on one such setting: child protective services (CPS). CPS workers often write copious free-form text notes about families they are working with, and CPS agencies are actively seeking to deploy NLP models to leverage these data. Given well-established racial bias in this setting, we investigate possible ways deployed NLP is liable to increase racial disparities. We specifically examine word statistics within notes and algorithmic fairness in risk prediction, coreference resolution, and named entity recognition (NER). We document consistent algorithmic unfairness in NER models, possible algorithmic unfairness in coreference resolution models, and little evidence of exacerbated racial bias in risk prediction. While there is existing pronounced criticism of risk prediction, our results expose previously undocumented risks of racial bias in realistic information extraction systems, highlighting potential concerns in deploying them, even though they may appear more benign. Our work serves as a rare realistic examination of NLP algorithmic fairness in a potential deployed setting and a timely investigation of a specific risk associated with deploying NLP in CPS settings.

52.0LGSep 20, 2021Code

Improving Span Representation for Domain-adapted Coreference Resolution

Nupoor Gandhi, Anjalie Field, Yulia Tsvetkov

Recent work has shown fine-tuning neural coreference models can produce strong performance when adapting to different domains. However, at the same time, this can require a large amount of annotated target examples. In this work, we focus on supervised domain adaptation for clinical notes, proposing the use of concept knowledge to more efficiently adapt coreference models to a new domain. We develop methods to improve the span representations via (1) a retrofitting loss to incentivize span representations to satisfy a knowledge-based distance function and (2) a scaffolding loss to guide the recovery of knowledge from the span representation. By integrating these losses, our model is able to improve our baseline precision and F-1 score. In particular, we show that incorporating knowledge with end-to-end coreference models results in better performance on the most challenging, domain-specific spans.

1.7IROct 15, 2019

Multi-dimensional Features for Prediction with Tweets

Nupoor Gandhi, Alex Morales, Dolores Albarracin

With the rise of opioid abuse in the US, there has been a growth of overlapping hotspots for overdose-related and HIV-related deaths in Springfield, Boston, Fall River, New Bedford, and parts of Cape Cod. With a large part of population, including rural communities, active on social media, it is crucial that we leverage the predictive power of social media as a preventive measure. We explore the predictive power of micro-blogging social media website Twitter with respect to HIV new diagnosis rates per county. While trending work in Twitter NLP has focused on primarily text-based features, we show that multi-dimensional feature construction can significantly improve the predictive power of topic features alone with respect STI's (sexually transmitted infections). By multi-dimensional features, we mean leveraging not only the topical features (text) of a corpus, but also location-based information (counties) about the tweets in feature-construction. We develop novel text-location-based smoothing features to predict new diagnoses of HIV.