Niharika Bhattacharjee

52.2LGJun 4Code

MolE-RAG: Molecular Structure-Enhanced Retrieval-Augmented Generation for Chemistry

Joey Chan, Wonbin Kweon, Ashley Shin et al.

Large language models (LLMs) have shown promise for molecular property prediction, but their ability to reason over chemical structures remains limited, as molecular representations such as SMILES differ substantially from the natural language on which LLMs are primarily trained. To bridge this semantic and chemical knowledge gap, we propose MolE-RAG, a training-free, molecule-centric retrieval-augmented generation framework for LLM-based molecular property prediction. MolE-RAG augments each prediction with three complementary sources of inference-time context: retrieved chemistry literature, molecule-specific information including compound synonyms, identifiers, functional group annotations, and physicochemical descriptors, and structurally similar molecules retrieved from the training set. We evaluate MolE-RAG across nine molecular property prediction tasks using proprietary, chemistry-specialized, and open-source LLMs. Across general-purpose LLMs, MolE-RAG improves ROC-AUC by up to 28 percentage points on classification tasks and reduces regression RMSE by up to 67% relative to a SMILES-only baseline. We further find that the utility of each context source varies across models and tasks, with different models benefiting most from textual retrieval, molecular context, or structural retrieval. These results suggest that molecule-centric retrieval can improve LLM-based molecular property prediction without model fine-tuning while providing a flexible framework for integrating heterogeneous chemical knowledge at inference time.

22.1HCApr 20

Are Gains Quiet and Losses Loud? Emotional Responses to Financial Booms and Crashes Online

Aryan Ramchandra Kapadia, Niharika Bhattacharjee, Mung Yao Jia et al.

Financial events negatively affect emotional well-being, but large-scale studies examining their impact on online emotional expression using real-time social media data remain limited. To address this gap, we propose analyzing Reddit communities (financial and non-financial) across two case studies: a financial crash and a boom. We investigate how emotional and psycholinguistic responses differ between financial and non-financial communities, and the extent to which the type of financial event affects user behavior during the two case study periods. To examine the effect of these events on expressed language, we analyze daily sentiment, emotion, and LIWC counts using quasi-experimental methods: Difference-in-Differences (DiD) and Causal Impact analyses during a financial boom and a financial crash. Overall, we find coherent, negative shifts in emotional responses during financial crashes, but weaker, mixed responses during booms. By exploring emotional and psycholinguistic expressions during financial events, we identify future implications for understanding online users' mental health and building connected, healthy communities.

Niharika Bhattacharjee

2 Papers