Ridam Pal

h-index7

6papers

2citations

Novelty22%

AI Score31

Ranked #141,307 of 201,326 authors (top 70%)#24,789 in CL (top 76%)

6 Papers

AIDec 23, 2025

Benchmarking LLMs for Predictive Applications in the Intensive Care Units

Chehak Malhotra, Mehak Gopal, Akshaya Devadiga et al.

With the advent of LLMs, various tasks across the natural language processing domain have been transformed. However, their application in predictive tasks remains less researched. This study compares large language models, including GatorTron-Base (trained on clinical data), Llama 8B, and Mistral 7B, against models like BioBERT, DocBERT, BioClinicalBERT, Word2Vec, and Doc2Vec, setting benchmarks for predicting Shock in critically ill patients. Timely prediction of shock can enable early interventions, thus improving patient outcomes. Text data from 17,294 ICU stays of patients in the MIMIC III database were scored for length of stay > 24 hours and shock index (SI) > 0.7 to yield 355 and 87 patients with normal and abnormal SI-index, respectively. Both focal and cross-entropy losses were used during finetuning to address class imbalances. Our findings indicate that while GatorTron Base achieved the highest weighted recall of 80.5%, the overall performance metrics were comparable between SLMs and LLMs. This suggests that LLMs are not inherently superior to SLMs in predicting future clinical events despite their strong performance on text-based tasks. To achieve meaningful clinical outcomes, future efforts in training LLMs should prioritize developing models capable of predicting clinical trajectories rather than focusing on simpler tasks such as named entity recognition or phenotyping.

CLJun 24, 2023

Characterizing the Emotion Carriers of COVID-19 Misinformation and Their Impact on Vaccination Outcomes in India and the United States

Ridam Pal, Sanjana S, Deepak Mahto et al.

The COVID-19 Infodemic had an unprecedented impact on health behaviors and outcomes at a global scale. While many studies have focused on a qualitative and quantitative understanding of misinformation, including sentiment analysis, there is a gap in understanding the emotion-carriers of misinformation and their differences across geographies. In this study, we characterized emotion carriers and their impact on vaccination rates in India and the United States. A manually labelled dataset was created from 2.3 million tweets and collated with three publicly available datasets (CoAID, AntiVax, CMU) to train deep learning models for misinformation classification. Misinformation labelled tweets were further analyzed for behavioral aspects by leveraging Plutchik Transformers to determine the emotion for each tweet. Time series analysis was conducted to study the impact of misinformation on spatial and temporal characteristics. Further, categorical classification was performed using transformer models to assign categories for the misinformation tweets. Word2Vec+BiLSTM was the best model for misinformation classification, with an F1-score of 0.92. The US had the highest proportion of misinformation tweets (58.02%), followed by the UK (10.38%) and India (7.33%). Disgust, anticipation, and anger were associated with an increased prevalence of misinformation tweets. Disgust was the predominant emotion associated with misinformation tweets in the US, while anticipation was the predominant emotion in India. For India, the misinformation rate exhibited a lead relationship with vaccination, while in the US it lagged behind vaccination. Our study deciphered that emotions acted as differential carriers of misinformation across geography and time. These carriers can be monitored to develop strategic interventions for countering misinformation, leading to improved public health.

CLOct 30, 2020Code

A Cross-lingual Natural Language Processing Framework for Infodemic Management

Ridam Pal, Rohan Pandey, Vaibhav Gautam et al.

The COVID-19 pandemic has put immense pressure on health systems which are further strained due to the misinformation surrounding it. Under such a situation, providing the right information at the right time is crucial. There is a growing demand for the management of information spread using Artificial Intelligence. Hence, we have exploited the potential of Natural Language Processing for identifying relevant information that needs to be disseminated amongst the masses. In this work, we present a novel Cross-lingual Natural Language Processing framework to provide relevant information by matching daily news with trusted guidelines from the World Health Organization. The proposed pipeline deploys various techniques of NLP such as summarizers, word embeddings, and similarity metrics to provide users with news articles along with a corresponding healthcare guideline. A total of 36 models were evaluated and a combination of LexRank based summarizer on Word2Vec embedding with Word Mover distance metric outperformed all other models. This novel open-source approach can be used as a template for proactive dissemination of relevant healthcare information in the midst of misinformation spread associated with epidemics.

MLNov 25, 2025

When Features Beat Noise: A Feature Selection Technique Through Noise-Based Hypothesis Testing

Mousam Sinha, Tirtha Sarathi Ghosh, Ridam Pal

Feature selection has remained a daunting challenge in machine learning and artificial intelligence, where increasingly complex, high-dimensional datasets demand principled strategies for isolating the most informative predictors. Despite widespread adoption, many established techniques suffer from notable limitations; some incur substantial computational cost, while others offer no definite statistical driven stopping criteria or assesses the significance of their importance scores. A common heuristic approach introduces multiple random noise features and retains all predictors ranked above the strongest noise feature. Although intuitive, this strategy lacks theoretical justification and depends heavily on heuristics. This paper proposes a novel feature selection method that addresses these limitations. Our approach introduces multiple random noise features and evaluates each feature's importance against the maximum importance value among these noise features incorporating a non-parametric bootstrap-based hypothesis testing framework to establish a solid theoretical foundation. We establish the conceptual soundness of our approach through statistical derivations that articulate the principles guiding the design of our algorithm. To evaluate its reliability, we generated simulated datasets under controlled statistical settings and benchmarked performance against Boruta and Knockoff-based methods, observing consistently stronger recovery of meaningful signal. As a demonstration of practical utility, we applied the technique across diverse real-world datasets, where it surpassed feature selection techniques including Boruta, RFE, and Extra Trees. Hence, the method emerges as a robust algorithm for principled feature selection, enabling the distillation of informative predictors that support reliable inference, enhanced predictive performance, and efficient computation.

CLApr 2, 2021

Mining Trends of COVID-19 Vaccine Beliefs on Twitter with Lexical Embeddings

Harshita Chopra, Aniket Vashishtha, Ridam Pal et al.

Social media plays a pivotal role in disseminating news globally and acts as a platform for people to express their opinions on various topics. A wide variety of views accompanies COVID-19 vaccination drives across the globe, often colored by emotions, which change along with rising cases, approval of vaccines, and multiple factors discussed online. This study aims at analyzing the temporal evolution of different Emotion categories: Hesitation, Rage, Sorrow, Anticipation, Faith, and Contentment with Influencing Factors: Vaccine Rollout, Misinformation, Health Effects, and Inequities as lexical categories created from Tweets belonging to five countries with vital vaccine roll-out programs, namely, India, United States of America, Brazil, United Kingdom, and Australia. We extracted a corpus of nearly 1.8 million Twitter posts related to COVID-19 vaccination. Using cosine distance from selected seed words, we expanded the vocabulary of each category and tracked the longitudinal change in their strength from June 2020 to April 2021. We used community detection algorithms to find modules in positive correlation networks. Our findings suggest that tweets expressing hesitancy towards vaccines contain the highest mentions of health-related effects in all countries. Our results indicated that the patterns of hesitancy were variable across geographies and can help us learn targeted interventions. We also observed a significant change in the linear trends of categories like hesitation and contentment before and after approval of vaccines. Negative emotions like rage and sorrow gained the highest importance in the alluvial diagram. They formed a significant module with all the influencing factors in April 2021, when India observed the second wave of COVID-19 cases. The relationship between Emotions and Influencing Factors was found to be variable across the countries.

CYMar 16, 2020

A Machine Learning Application for Raising WASH Awareness in the Times of COVID-19 Pandemic

Rohan Pandey, Vaibhav Gautam, Ridam Pal et al.

Background: The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this Infodemic requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. Objective: We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. Methods: We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. Results: A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot Satya increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. Conclusion: We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation.