47.0DLMar 15
Rising Prevalence of Detected AI-Generated Text in Medical Literature: Longitudinal Analysis in Open Access ArticlesNathan Wolfrath, Simrin Patel, Madelyn Flitcroft et al.
Generative artificial intelligence (AI) tools are becoming increasingly used for writing tasks. However, the extent of their use in peer-reviewed medical literature remains unclear. We conducted a longitudinal analysis of all Original Investigations, Research Letters, and Invited Commentaries published in JAMA Network Open from January 2022 through March 2025. The main body text of 7,251 articles was analyzed using a commercial AI-detection tool (Originality.AI) to estimate the probability that manuscripts contained a significant amount of AI-generated content. Articles were analyzed aggregated by month, publication type, and domain. Overall, 195 articles (2.7%) were classified as containing significant AI-generated text. The monthly proportion increased from 0.0% in January 2022 to 11.3% in March 2025, with a significant upward trend over time (P<0.001). Invited Commentaries had the highest proportion of detected AI-generated content (6.7%), followed by Original Investigations (2.2%) and Research Letters (1.4%). There was also significant variation across publication domain (P=0.04). Only 15 articles (0.2%) disclosed large language model use, of which 40.0% were classified as containing AI-generated text. While findings suggest increasing detectable AI-generated content in medical literature, limitations of current detection tools necessitates cautious interpretation.
LGSep 18, 2024
Stronger Baseline Models -- A Key Requirement for Aligning Machine Learning Research with Clinical UtilityNathan Wolfrath, Joel Wolfrath, Hengrui Hu et al.
Machine Learning (ML) research has increased substantially in recent years, due to the success of predictive modeling across diverse application domains. However, well-known barriers exist when attempting to deploy ML models in high-stakes, clinical settings, including lack of model transparency (or the inability to audit the inference process), large training data requirements with siloed data sources, and complicated metrics for measuring model utility. In this work, we show empirically that including stronger baseline models in healthcare ML evaluations has important downstream effects that aid practitioners in addressing these challenges. Through a series of case studies, we find that the common practice of omitting baselines or comparing against a weak baseline model (e.g. a linear model with no optimization) obscures the value of ML methods proposed in the research literature. Using these insights, we propose some best practices that will enable practitioners to more effectively study and deploy ML models in clinical settings.
MLApr 6, 2025
A Novel Algorithm for Personalized Federated Learning: Knowledge Distillation with Weighted Combination LossHengrui Hu, Anai N. Kothari, Anjishnu Banerjee
Federated learning (FL) offers a privacy-preserving framework for distributed machine learning, enabling collaborative model training across diverse clients without centralizing sensitive data. However, statistical heterogeneity, characterized by non-independent and identically distributed (non-IID) client data, poses significant challenges, leading to model drift and poor generalization. This paper proposes a novel algorithm, pFedKD-WCL (Personalized Federated Knowledge Distillation with Weighted Combination Loss), which integrates knowledge distillation with bi-level optimization to address non-IID challenges. pFedKD-WCL leverages the current global model as a teacher to guide local models, optimizing both global convergence and local personalization efficiently. We evaluate pFedKD-WCL on the MNIST dataset and a synthetic dataset with non-IID partitioning, using multinomial logistic regression and multilayer perceptron models. Experimental results demonstrate that pFedKD-WCL outperforms state-of-the-art algorithms, including FedAvg, FedProx, Per-FedAvg, and pFedMe, in terms of accuracy and convergence speed.