Sukanya Krishna

CL
h-index15
5papers
35citations
Novelty47%
AI Score50

5 Papers

LGDec 25, 2025Code
Statistical vs. Deep Learning Models for Estimating Substance Overdose Excess Mortality in the US

Sukanya Krishna, Marie-Laure Charpignon, Maimuna Majumder

Substance overdose mortality in the United States claimed over 80,000 lives in 2023, with the COVID-19 pandemic exacerbating existing trends through healthcare disruptions and behavioral changes. Estimating excess mortality, defined as deaths beyond expected levels based on pre-pandemic patterns, is essential for understanding pandemic impacts and informing intervention strategies. However, traditional statistical methods like SARIMA assume linearity, stationarity, and fixed seasonality, which may not hold under structural disruptions. We present a systematic comparison of SARIMA against three deep learning (DL) architectures (LSTM, Seq2Seq, and Transformer) for counterfactual mortality estimation using national CDC data (2015-2019 for training/validation, 2020-2023 for projection). We contribute empirical evidence that LSTM achieves superior point estimation (17.08% MAPE vs. 23.88% for SARIMA) and better-calibrated uncertainty (68.8% vs. 47.9% prediction interval coverage) when projecting under regime change. We also demonstrate that attention-based models (Seq2Seq, Transformer) underperform due to overfitting to historical means rather than capturing emergent trends. Ourreproducible pipeline incorporates conformal prediction intervals and convergence analysis across 60+ trials per configuration, and we provide an open-source framework deployable with 15 state health departments. Our findings establish that carefully validated DL models can provide more reliable counterfactual estimates than traditional methods for public health planning, while highlighting the need for calibration techniques when deploying neural forecasting in high-stakes domains.

87.7HCApr 13
Toward Human-AI Complementarity Across Diverse Tasks

Yuzheng Xu, Annya Dahmani, Matthew D. Blanchard et al.

Human-AI complementarity, the idea that combining human and AI judgments can outperform either alone, offers a promising pathway toward robust oversight of advanced AI systems. However, whether human-AI complementarity can be achieved on realistic tasks remains an open question. We investigate this through two approaches: hybridization and two AI assistance methods (top-2 assistance and subtask delegation), evaluated on a multi-domain dataset of 1,886 samples spanning knowledge, factuality, long-context reasoning, and deception detection. We find only modest complementarity gains. Baseline hybridization yields just +0.4 percentage points (pp) over AI alone (69.3\% vs 68.9\%), limited both by a small complementarity region (only 8.9\% of items where AI errs but humans do not) and the inability of confidence-based routing to identify it, since the model's confidence is similarly distributed across correct and incorrect predictions. Applied when AI has low confidence, top-2 assistance increases human accuracy from 28.4\% to 38.3\%, surpassing AI alone (37.7\%) -- but primarily because humans adopt correct AI suggestions, not because they successfully override AI errors. These findings suggest that the primary bottleneck is not human task accuracy per se, but the ability to route decisions to humans when it matters and to design assistance methods that enable humans to catch AI mistakes. Our quantitative and qualitative analyses pinpoint where and why each method succeeds or fails, offering concrete targets for future work. We will release our dataset and code upon request to support progress toward more effective human-AI collaboration for AI oversight.

CLJun 18, 2025Code
Veracity: An Open-Source AI Fact-Checking System

Taylor Lynn Curtis, Maximilian Puelma Touzel, William Garneau et al.

The proliferation of misinformation poses a significant threat to society, exacerbated by the capabilities of generative AI. This demo paper introduces Veracity, an open-source AI system designed to empower individuals to combat misinformation through transparent and accessible fact-checking. Veracity leverages the synergy between Large Language Models (LLMs) and web retrieval agents to analyze user-submitted claims and provide grounded veracity assessments with intuitive explanations. Key features include multilingual support, numerical scoring of claim veracity, and an interactive interface inspired by familiar messaging applications. This paper will showcase Veracity's ability to not only detect misinformation but also explain its reasoning, fostering media literacy and promoting a more informed society.

CLNov 10, 2024
Epistemic Integrity in Large Language Models

Bijean Ghafouri, Shahrad Mohammadzadeh, James Zhou et al.

Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statements with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration $\unicode{x2013}$ where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models (LLMs) which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty LLMs hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing this miscalibration, offering a path towards correcting it and more trustworthy AI across domains.

DATA-ANNov 24, 2021
Particle Graph Autoencoders and Differentiable, Learned Energy Mover's Distance

Steven Tsan, Raghav Kansal, Anthony Aportela et al.

Autoencoders have useful applications in high energy physics in anomaly detection, particularly for jets - collimated showers of particles produced in collisions such as those at the CERN Large Hadron Collider. We explore the use of graph-based autoencoders, which operate on jets in their "particle cloud" representations and can leverage the interdependencies among the particles within a jet, for such tasks. Additionally, we develop a differentiable approximation to the energy mover's distance via a graph neural network, which may subsequently be used as a reconstruction loss function for autoencoders.