Saketh Vishnubhatla

LG
h-index14
7papers
8citations
Novelty36%
AI Score46

7 Papers

64.9AIMar 26
DAGverse: Building Document-Grounded Semantic DAGs from Scientific Papers

Shu Wan, Saketh Vishnubhatla, Iskander Kushbay et al.

Directed Acyclic Graphs (DAGs) are widely used to represent structured knowledge in scientific and technical domains. However, datasets for real-world DAGs remain scarce because constructing them typically requires expert interpretation of domain documents. We study Doc2SemDAG construction: recovering a preferred semantic DAG from a document together with the cited evidence and context that explain it. This problem is challenging because a document may admit multiple plausible abstractions, the intended structure is often implicit, and the supporting evidence is scattered across prose, equations, captions, and figures. To address these challenges, we leverage scientific papers containing explicit DAG figures as a natural source of supervision. In this setting, the DAG figure provides the DAG structure, while the accompanying text provides context and explanation. We introduce DAGverse, a framework for constructing document-grounded semantic DAGs from online scientific papers. Its core component, DAGverse-Pipeline, is a semi-automatic system designed to produce high-precision semantic DAG examples through figure classification, graph reconstruction, semantic grounding, and validation. As a case study, we test the framework for causal DAGs and release DAGverse-1, a dataset of 108 expert-validated semantic DAGs with graph-level, node-level, and edge-level evidence. Experiments show that DAGverse-Pipeline outperforms existing Vision-Language Models on DAG classification and annotation. DAGverse provides a foundation for document-grounded DAG benchmarks and opens new directions for studying structured reasoning grounded in real-world evidence.

LGDec 8, 2025
CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification

Pingchuan Ma, Chengshuai Zhao, Bohan Jiang et al.

Crisis classification in social media aims to extract actionable disaster-related information from multimodal posts, which is a crucial task for enhancing situational awareness and facilitating timely emergency responses. However, the wide variation in crisis types makes achieving generalizable performance across unseen disasters a persistent challenge. Existing approaches primarily leverage deep learning to fuse textual and visual cues for crisis classification, achieving numerically plausible results under in-domain settings. However, they exhibit poor generalization across unseen crisis types because they 1. do not disentangle spurious and causal features, resulting in performance degradation under domain shift, and 2. fail to align heterogeneous modality representations within a shared space, which hinders the direct adaptation of established single-modality domain generalization (DG) techniques to the multimodal setting. To address these issues, we introduce a causality-guided multimodal domain generalization (MMDG) framework that combines adversarial disentanglement with unified representation learning for crisis classification. The adversarial objective encourages the model to disentangle and focus on domain-invariant causal features, leading to more generalizable classifications grounded in stable causal mechanisms. The unified representation aligns features from different modalities within a shared latent space, enabling single-modality DG strategies to be seamlessly extended to multimodal learning. Experiments on the different datasets demonstrate that our approach achieves the best performance in unseen disaster scenarios.

33.5CLMay 12
Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence

Ujun Jeong, Saketh Vishnubhatla, Bohan Jiang et al.

During disasters, extracting causal relations from social media can strengthen situational awareness by identifying factors linked to casualties, physical damage, infrastructure disruption, and cascading impacts. However, disaster-related posts are often informal, fragmented, and context-dependent, and they may describe personal experiences rather than explicit causal relations. In this work, we examine whether Large Language Models (LLMs) can effectively extract causal relations from disaster-related social media posts. To this end, we (1) propose an expert-grounded evaluation framework that compares LLM-generated causal graphs with reference graphs derived from disaster-specific reports and (2) assess whether the extracted relations are supported by post-event evidence or instead reflect model priors. Our findings highlight both the potential and risks of using LLMs for causal relation extraction in disaster decision-support systems.

53.1LGMar 10
Proxy-Guided Measurement Calibration

Saketh Vishnubhatla, Shu Wan, Andre Harrison et al.

Aggregate outcome variables collected through surveys and administrative records are often subject to systematic measurement error. For instance, in disaster loss databases, county-level losses reported may differ from the true damages due to variations in on-the-ground data collection capacity, reporting practices, and event characteristics. Such miscalibration complicates downstream analysis and decision-making. We study the problem of outcome miscalibration and propose a framework guided by proxy variables for estimating and correcting the systematic errors. We model the data-generating process using a causal graph that separates latent content variables driving the true outcome from the latent bias variables that induce systematic errors. The key insight is that proxy variables that depend on the true outcome but are independent of the bias mechanism provide identifying information for quantifying the bias. Leveraging this structure, we introduce a two-stage approach that utilizes variational autoencoders to disentangle content and bias latents, enabling us to estimate the effect of bias on the outcome of interest. We analyze the assumptions underlying our approach and evaluate it on synthetic data, semi-synthetic datasets derived from randomized trials, and a real-world case study of disaster loss reporting.

LGFeb 5, 2024
Causal Feature Selection for Responsible Machine Learning

Raha Moraffah, Paras Sheth, Saketh Vishnubhatla et al.

Machine Learning (ML) has become an integral aspect of many real-world applications. As a result, the need for responsible machine learning has emerged, focusing on aligning ML models to ethical and social values, while enhancing their reliability and trustworthiness. Responsible ML involves many issues. This survey addresses four main issues: interpretability, fairness, adversarial robustness, and domain generalization. Feature selection plays a pivotal role in the responsible ML tasks. However, building upon statistical correlations between variables can lead to spurious patterns with biases and compromised performance. This survey focuses on the current study of causal feature selection: what it is and how it can reinforce the four aspects of responsible ML. By identifying features with causal impacts on outcomes and distinguishing causality from correlation, causal feature selection is posited as a unique approach to ensuring ML models to be ethically and socially responsible in high-stakes applications.

LGSep 15, 2025
An Interventional Approach to Real-Time Disaster Assessment via Causal Attribution

Saketh Vishnubhatla, Alimohammad Beigi, Rui Heng Foo et al.

Traditional disaster analysis and modelling tools for assessing the severity of a disaster are predictive in nature. Based on the past observational data, these tools prescribe how the current input state (e.g., environmental conditions, situation reports) results in a severity assessment. However, these systems are not meant to be interventional in the causal sense, where the user can modify the current input state to simulate counterfactual "what-if" scenarios. In this work, we provide an alternative interventional tool that complements traditional disaster modelling tools by leveraging real-time data sources like satellite imagery, news, and social media. Our tool also helps understand the causal attribution of different factors on the estimated severity, over any given region of interest. In addition, we provide actionable recourses that would enable easier mitigation planning. Our source code is publicly available.

LGSep 15, 2025
Assessing On-the-Ground Disaster Impact Using Online Data Sources

Saketh Vishnubhatla, Ujun Jeong, Bohan Jiang et al.

Assessing the impact of a disaster in terms of asset losses and human casualties is essential for preparing effective response plans. Traditional methods include offline assessments conducted on the ground, where volunteers and first responders work together to collect the estimate of losses through windshield surveys or on-ground inspection. However, these methods have a time delay and are prone to different biases. Recently, various online data sources, including social media, news reports, aerial imagery, and satellite data, have been utilized to evaluate the impact of disasters. Online data sources provide real-time data streams for estimating the offline impact. Limited research exists on how different online sources help estimate disaster impact at a given administrative unit. In our work, we curate a comprehensive dataset by collecting data from multiple online sources for a few billion-dollar disasters at the county level. We also analyze how online estimates compare with traditional offline-based impact estimates for the disaster. Our findings provide insight into how different sources can provide complementary information to assess the disaster.