Shashwat Sourav

LG
h-index28
7papers
21citations
Novelty44%
AI Score47

7 Papers

87.9CEJun 1
Matter to Mechanism: A Benchmark for AI Co-Scientists in Materials and Battery Research

Shashwat Sourav, Tanjin. He, Maria K. Y. Chan et al.

AI co-scientists are increasingly used for scientific discovery, but current evaluations still do not test them on a key task: moving from a concrete scientific or technological problem to a plausible, mechanism-grounded solution hypothesis. This gap is especially important in materials science and, in particular, battery research, where a useful proposal must identify the relevant failure mode, propose a credible intervention, and explain why that intervention should improve the target property. We introduce Matter to Mechanism, a benchmark for evaluating AI co-scientists on problem-to-hypothesis reasoning in materials science, with a focus on battery materials research. The benchmark contains 2,645 instances derived from scientific publications. Each instance includes a structured problem statement, a candidate solution hypothesis, an explicit reasoning trace, and domain-grounded annotations such as material system, component, failure mode, intervention, mechanism, target property, and claimed outcome. We also introduce a metric suite that measures reasoning fidelity, problem alignment, mechanistic specificity, novelty, plausibility, and problem decomposition quality, and combine them into a composite score. Using this framework, we evaluate several AI co-scientist systems and show that Matter to Mechanism reveals interpretable system differences that are only partially recovered by standard text-similarity metrics. We further show through adversarial stress tests that the aggregate score is more stable than individual metric dimensions under superficial gaming attacks.

47.6AIMay 26
The Compressive Knowledge Graph Hypothesis: Which Graph Facts Matter for Scientific Hypothesis Generation?

Shashwat Sourav, Viktoriia Baibakova, Sanjay Das et al.

Knowledge graphs (KGs) can provide structured scientific context to language models, but it remains unclear which graph facts actually shape the generated hypotheses. We study KG-guided hypothesis generation for battery materials across Mistral-7B, Llama-3.1-70B, and Gemini 2.5 Flash. We perturb local KGs by varying density, ontology richness, topology, and control structure, and evaluate outputs with both provided-graph and fixed-reference metrics. Across models, KG utility is selective and model-dependent: graph context changes outputs, but no-KG outputs also recover substantial graph content from model priors. Compact top-k subgraphs often approximate full-KG behavior, including when claimed-outcome triples are held out. At the same time, compression is not unique to one semantic ranking rule, random and topology-based subsets can also recover much of the signal. These results support a redundancy-aware Compressive KG hypothesis: useful KG signal is often recoverable from compact, scientifically structured subgraphs rather than requiring the full local graph.

DLNov 1, 2022
A Bayesian Learning, Greedy agglomerative clustering approach and evaluation techniques for Author Name Disambiguation Problem

Shashwat Sourav

Author names often suffer from ambiguity owing to the same author appearing under different names and multiple authors possessing similar names. It creates difficulty in associating a scholarly work with the person who wrote it, thereby introducing inaccuracy in credit attribution, bibliometric analysis, search-by-author in a digital library, and expert discovery. A plethora of techniques for disambiguation of author names has been proposed in the literature. I try to focus on the research efforts targeted to disambiguate author names. I first go through the conventional methods, then I discuss evaluation techniques and the clustering model which finally leads to the Bayesian learning and Greedy agglomerative approach. I believe this concentrated review will be useful for the research community because it discusses techniques applied to a very large real database that is actively used worldwide. The Bayesian and the greedy agglomerative approach used will help to tackle AND problems in a better way. Finally, I try to outline a few directions for future work

LGMar 24, 2024
Partially Blinded Unlearning: Class Unlearning for Deep Networks a Bayesian Perspective

Subhodip Panda, Shashwat Sourav, Prathosh A. P

In order to adhere to regulatory standards governing individual data privacy and safety, machine learning models must systematically eliminate information derived from specific subsets of a user's training data that can no longer be utilized. The emerging discipline of Machine Unlearning has arisen as a pivotal area of research, facilitating the process of selectively discarding information designated to specific sets or classes of data from a pre-trained model, thereby eliminating the necessity for extensive retraining from scratch. The principal aim of this study is to formulate a methodology tailored for the purposeful elimination of information linked to a specific class of data from a pre-trained classification network. This intentional removal is crafted to degrade the model's performance specifically concerning the unlearned data class while concurrently minimizing any detrimental impacts on the model's performance in other classes. To achieve this goal, we frame the class unlearning problem from a Bayesian perspective, which yields a loss function that minimizes the log-likelihood associated with the unlearned data with a stability regularization in parameter space. This stability regularization incorporates Mohalanobis distance with respect to the Fisher Information matrix and $l_2$ distance from the pre-trained model parameters. Our novel approach, termed \textbf{Partially-Blinded Unlearning (PBU)}, surpasses existing state-of-the-art class unlearning methods, demonstrating superior effectiveness. Notably, PBU achieves this efficacy without requiring awareness of the entire training dataset but only to the unlearned data points, marking a distinctive feature of its performance.

CLApr 7, 2025
A Survey on Hypothesis Generation for Scientific Discovery in the Era of Large Language Models

Atilla Kaan Alkan, Shashwat Sourav, Maja Jablonska et al.

Hypothesis generation is a fundamental step in scientific discovery, yet it is increasingly challenged by information overload and disciplinary fragmentation. Recent advances in Large Language Models (LLMs) have sparked growing interest in their potential to enhance and automate this process. This paper presents a comprehensive survey of hypothesis generation with LLMs by (i) reviewing existing methods, from simple prompting techniques to more complex frameworks, and proposing a taxonomy that categorizes these approaches; (ii) analyzing techniques for improving hypothesis quality, such as novelty boosting and structured reasoning; (iii) providing an overview of evaluation strategies; and (iv) discussing key challenges and future directions, including multimodal integration and human-AI collaboration. Our survey aims to serve as a reference for researchers exploring LLMs for hypothesis generation.

LGOct 12, 2025
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness

Subhodip Panda, Dhruv Tarsadiya, Shashwat Sourav et al.

Influence estimation methods promise to explain and debug machine learning by estimating the impact of individual samples on the final model. Yet, existing methods collapse under training randomness: the same example may appear critical in one run and irrelevant in the next. Such instability undermines their use in data curation or cleanup since it is unclear if we indeed deleted/kept the correct datapoints. To overcome this, we introduce *f-influence* -- a new influence estimation framework grounded in hypothesis testing that explicitly accounts for training randomness, and establish desirable properties that make it suitable for reliable influence estimation. We also design a highly efficient algorithm **f**-**IN**fluence **E**stimation (**f-INE**) that computes f-influence **in a single training run**. Finally, we scale up f-INE to estimate influence of instruction tuning data on Llama-3.1-8B and show it can reliably detect poisoned samples that steer model opinions, demonstrating its utility for data cleanup and attributing model behavior.

IMSep 23, 2025
The Platonic Universe: Do Foundation Models See the Same Sky?

UniverseTBD, Kshitij Duraphe, Michael J. Smith et al.

We test the Platonic Representation Hypothesis (PRH) in astronomy by measuring representational convergence across a range of foundation models trained on different data types. Using spectroscopic and imaging observations from JWST, HSC, Legacy Survey, and DESI, we compare representations from vision transformers, self-supervised models, and astronomy-specific architectures via mutual $k$-nearest neighbour analysis. We observe consistent scaling: representational alignment generally increases with model capacity across our tested architectures, supporting convergence toward a shared representation of galaxy astrophysics. Our results suggest that astronomical foundation models can use pre-trained general-purpose architectures, allowing us to capitalise on the broader machine learning community's already-spent computational investment.