IVAug 23, 2023
InverseSR: 3D Brain MRI Super-Resolution Using a Latent Diffusion ModelJueqi Wang, Jacob Levman, Walter Hugo Lopez Pinaya et al.
High-resolution (HR) MRI scans obtained from research-grade medical centers provide precise information about imaged tissues. However, routine clinical MRI scans are typically in low-resolution (LR) and vary greatly in contrast and spatial resolution due to the adjustments of the scanning parameters to the local needs of the medical center. End-to-end deep learning methods for MRI super-resolution (SR) have been proposed, but they require re-training each time there is a shift in the input distribution. To address this issue, we propose a novel approach that leverages a state-of-the-art 3D brain generative model, the latent diffusion model (LDM) trained on UK BioBank, to increase the resolution of clinical MRI scans. The LDM acts as a generative prior, which has the ability to capture the prior distribution of 3D T1-weighted brain MRI. Based on the architecture of the brain LDM, we find that different methods are suitable for different settings of MRI SR, and thus propose two novel strategies: 1) for SR with more sparsity, we invert through both the decoder of the LDM and also through a deterministic Denoising Diffusion Implicit Models (DDIM), an approach we will call InverseSR(LDM); 2) for SR with less sparsity, we invert only through the LDM decoder, an approach we will call InverseSR(Decoder). These two approaches search different latent spaces in the LDM model to find the optimal latent code to map the given LR MRI into HR. The training process of the generative model is independent of the MRI under-sampling process, ensuring the generalization of our method to many MRI SR problems with different input measurements. We validate our method on over 100 brain T1w MRIs from the IXI dataset. Our method can demonstrate that powerful priors given by LDM can be used for MRI reconstruction.
CLDec 24, 2025Code
Automatic Replication of LLM Mistakes in Medical ConversationsOleksii Proniakin, Diego Fajardo, Ruslan Nazarenko et al.
Large language models (LLMs) are increasingly evaluated in clinical settings using multi-dimensional rubrics which quantify reasoning quality, safety, and patient-centeredness. Yet, replicating specific mistakes in other LLM models is not straightforward and often requires manual effort. We introduce MedMistake, an automatic pipeline that extracts mistakes LLMs make in patient-doctor conversations and converts them into a benchmark of single-shot QA pairs. Our pipeline (1) creates complex, conversational data between an LLM patient and LLM doctor, (2) runs an evaluation with a committee of 2 LLM judges across a variety of dimensions and (3) creates simplified single-shot QA scenarios from those mistakes. We release MedMistake-All, a dataset of 3,390 single-shot QA pairs where GPT-5 and Gemini 2.5 Pro are currently failing to answer correctly, as judged by two LLM judges. We used medical experts to validate a subset of 211/3390 questions (MedMistake-Bench), which we used to run a final evaluation of 12 frontier LLMs: Claude Opus 4.5, Claude Sonnet 4.5, DeepSeek-Chat, Gemini 2.5 Pro, Gemini 3 Pro, GPT-4o, GPT-5, GPT-5.1, GPT-5.2, Grok 4, Grok 4.1, Mistral Large. We found that GPT models, Claude and Grok obtained the best performance on MedMistake-Bench. We release both the doctor-validated benchmark (MedMistake-Bench), as well as the full dataset (MedMistake-All) at https://huggingface.co/datasets/TheLumos/MedicalMistakeBenchmark.
NEApr 15, 2024Code
SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural NetworksSreyes Venkatesh, Razvan Marinescu, Jason K. Eshraghian
Weight quantization is used to deploy high-performance deep learning models on resource-limited hardware, enabling the use of low-precision integers for storage and computation. Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference. While extensive research has focused on weight quantization, quantization-aware training (QAT), and their application to SNNs, the precision reduction of state variables during training has been largely overlooked, potentially diminishing inference performance. This paper introduces two QAT schemes for stateful neurons: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization, which allocates exponentially more quantization levels near the firing threshold. Our results show that increasing the density of quantization levels around the firing threshold improves accuracy across several benchmark datasets. We provide an ablation analysis of the effects of weight and state quantization, both individually and combined, and how they impact models. Our comprehensive empirical evaluation includes full precision, 8-bit, 4-bit, and 2-bit quantized SNNs, using QAT, stateful QAT (SQUAT), and post-training quantization methods. The findings indicate that the combination of QAT and SQUAT enhance performance the most, but given the choice of one or the other, QAT improves performance by the larger degree. These trends are consistent all datasets. Our methods have been made available in our Python library snnTorch: https://github.com/jeshraghian/snntorch.
LGJan 29
Holographic generative flows with AdS/CFTEhsan Mirafzali, Sanjit Shashi, Sanya Murdeshwar et al.
We present a framework for generative machine learning that leverages the holographic principle of quantum gravity, or to be more precise its manifestation as the anti-de Sitter/conformal field theory (AdS/CFT) correspondence, with techniques for deep learning and transport theory. Our proposal is to represent the flow of data from a base distribution to some learned distribution using the bulk-to-boundary mapping of scalar fields in AdS. In the language of machine learning, we are representing and augmenting the flow-matching algorithm with AdS physics. Using a checkerboard toy dataset and MNIST, we find that our model achieves faster and higher quality convergence than comparable physics-free flow-matching models. Our method provides a physically interpretable version of flow matching. More broadly, it establishes the utility of AdS physics and geometry in the development of novel paradigms in generative modeling.
CLDec 18, 2025
A Women's Health Benchmark for Large Language ModelsVictoria-Elisabeth Gruber, Razvan Marinescu, Diego Fajardo et al.
As large language models (LLMs) become primary sources of health information for millions, their accuracy in women's health remains critically unexamined. We introduce the Women's Health Benchmark (WHB), the first benchmark evaluating LLM performance specifically in women's health. Our benchmark comprises 96 rigorously validated model stumps covering five medical specialties (obstetrics and gynecology, emergency medicine, primary care, oncology, and neurology), three query types (patient query, clinician query, and evidence/policy query), and eight error types (dosage/medication errors, missing critical information, outdated guidelines/treatment recommendations, incorrect treatment advice, incorrect factual information, missing/incorrect differential diagnosis, missed urgency, and inappropriate recommendations). We evaluated 13 state-of-the-art LLMs and revealed alarming gaps: current models show approximately 60\% failure rates on the women's health benchmark, with performance varying dramatically across specialties and error types. Notably, models universally struggle with "missed urgency" indicators, while newer models like GPT-5 show significant improvements in avoiding inappropriate recommendations. Our findings underscore that AI chatbots are not yet fully able of providing reliable advice in women's health.
40.0LGMay 12
Hessian Matching for Machine-Learned Coarse-Grained Molecular DynamicsSanya Murdeshwar, Sanjit Shashi, Kevin Bachelor et al.
Coarse-grained (CG) molecular dynamics enables simulations of atomic systems such as biomolecules at timescales inaccessible to all-atom (AA) methods, but existing CG neural potentials trained via force matching capture only the gradient of the free-energy surface, leaving its curvature unconstrained. We introduce a framework that augments force matching with stochastic Hessian-vector product (HVP) matching, instilling second-order curvature information into CG potentials without constructing the full Hessian. We derive a decomposition of the target CG Hessian into a model-independent projected AA Hessian, precomputed once before training, and a model-dependent covariance correction computed online at negligible cost. We construct an unbiased stochastic estimator of the Hessian-matching objective by using random probe vectors. We evaluate our method by comparing against force matching on a benchmark of nine fast-folding proteins unseen during training. HVP matching outperforms plain force matching on 8 of 9 proteins on slow-mode metrics, with reductions of up to 85% in the Kullback--Leibler divergence between the CG and reference distributions along the slowest collective mode of the largest protein. Our results demonstrate that higher-order physical supervision is a practical path to more accurate and transferable CG potentials for biomolecular simulation.
LGOct 20, 2025Code
A Standardized Benchmark for Machine-Learned Molecular Dynamics using Weighted Ensemble SamplingAlexander Aghili, Andy Bruce, Daniel Sabo et al.
The rapid evolution of molecular dynamics (MD) methods, including machine-learned dynamics, has outpaced the development of standardized tools for method validation. Objective comparison between simulation approaches is often hindered by inconsistent evaluation metrics, insufficient sampling of rare conformational states, and the absence of reproducible benchmarks. To address these challenges, we introduce a modular benchmarking framework that systematically evaluates protein MD methods using enhanced sampling analysis. Our approach uses weighted ensemble (WE) sampling via The Weighted Ensemble Simulation Toolkit with Parallelization and Analysis (WESTPA), based on progress coordinates derived from Time-lagged Independent Component Analysis (TICA), enabling fast and efficient exploration of protein conformational space. The framework includes a flexible, lightweight propagator interface that supports arbitrary simulation engines, allowing both classical force fields and machine learning-based models. Additionally, the framework offers a comprehensive evaluation suite capable of computing more than 19 different metrics and visualizations across a variety of domains. We further contribute a dataset of nine diverse proteins, ranging from 10 to 224 residues, that span a variety of folding complexities and topologies. Each protein has been extensively simulated at 300K for one million MD steps per starting point (4 ns). To demonstrate the utility of our framework, we perform validation tests using classic MD simulations with implicit solvent and compare protein conformational sampling using a fully trained versus under-trained CGSchNet model. By standardizing evaluation protocols and enabling direct, reproducible comparisons across MD approaches, our open-source platform lays the groundwork for consistent, rigorous benchmarking across the molecular simulation community.
CVSep 1, 2025Code
RibPull: Implicit Occupancy Fields and Medial Axis Extraction for CT Ribcage ScansEmmanouil Nikolakakis, Amine Ouasfi, Julie Digne et al.
We present RibPull, a methodology that utilizes implicit occupancy fields to bridge computational geometry and medical imaging. Implicit 3D representations use continuous functions that handle sparse and noisy data more effectively than discrete methods. While voxel grids are standard for medical imaging, they suffer from resolution limitations, topological information loss, and inefficient handling of sparsity. Coordinate functions preserve complex geometrical information and represent a better solution for sparse data representation, while allowing for further morphological operations. Implicit scene representations enable neural networks to encode entire 3D scenes within their weights. The result is a continuous function that can implicitly compesate for sparse signals and infer further information about the 3D scene by passing any combination of 3D coordinates as input to the model. In this work, we use neural occupancy fields that predict whether a 3D point lies inside or outside an object to represent CT-scanned ribcages. We also apply a Laplacian-based contraction to extract the medial axis of the ribcage, thus demonstrating a geometrical operation that benefits greatly from continuous coordinate-based 3D scene representations versus voxel-based representations. We evaluate our methodology on 20 medical scans from the RibSeg dataset, which is itself an extension of the RibFrac dataset. We will release our code upon publication.
IVFeb 4, 2025Code
ReMiDi: Reconstruction of Microstructure Using a Differentiable Diffusion MRI SimulatorPrathamesh Pradeep Khole, Zahra Kais Petiwala, Shri Prathaa Magesh et al.
We propose ReMiDi, a novel method for inferring neuronal microstructure as arbitrary 3D meshes using a differentiable diffusion Magnetic Resonance Imaging (dMRI) simulator. We first implemented in PyTorch a differentiable dMRI simulator that simulates the forward diffusion process using a finite-element method on an input 3D microstructure mesh. To achieve significantly faster simulations, we solve the differential equation semi-analytically using a matrix formalism approach. Given a reference dMRI signal $S_{ref}$, we use the differentiable simulator to iteratively update the input mesh such that it matches $S_{ref}$ using gradient-based learning. Since directly optimizing the 3D coordinates of the vertices is challenging, particularly due to ill-posedness of the inverse problem, we instead optimize a lower-dimensional latent space representation of the mesh. The mesh is first encoded into spectral coefficients, which are further encoded into a latent $\textbf{z}$ using an auto-encoder, and are then decoded back into the true mesh. We present an end-to-end differentiable pipeline that simulates signals that can be tuned to match a reference signal by iteratively updating the latent representation $\textbf{z}$. We demonstrate the ability to reconstruct microstructures of arbitrary shapes represented by finite-element meshes, with a focus on axonal geometries found in the brain white matter, including bending, fanning and beading fibers. Our source code is available online.
CVApr 23, 2024Code
BMapEst: Estimation of Brain Tissue Probability Maps using a Differentiable MRI SimulatorUtkarsh Gupta, Emmanouil Nikolakakis, Moritz Zaiss et al.
Reconstructing digital brain phantoms in the form of voxel-based, multi-channeled tissue probability maps for individual subjects is essential for capturing brain anatomical variability, understanding neurological diseases, as well as for testing image processing methods. We demonstrate the first framework that estimates brain tissue probability maps (Grey Matter - GM, White Matter - WM, and Cerebrospinal fluid - CSF) with the help of a Physics-based differentiable MRI simulator that models the magnetization signal at each voxel in the volume. Given an observed $T_1$/$T_2$-weighted MRI scan, the corresponding clinical MRI sequence, and the MRI differentiable simulator, we estimate the simulator's input probability maps by back-propagating the L2 loss between the simulator's output and the $T_1$/$T_2$-weighted scan. This approach has the significant advantage of not relying on any training data and instead uses the strong inductive bias of the MRI simulator. We tested the model on 20 scans from the BrainWeb database and demonstrated a highly accurate reconstruction of GM, WM, and CSF. Our source code is available online: https://github.com/BioMedAI-UCSC/BMapEst.
IVJul 20, 2021Code
3D-StyleGAN: A Style-Based Generative Adversarial Network for Generative Modeling of Three-Dimensional Medical ImagesSungmin Hong, Razvan Marinescu, Adrian V. Dalca et al.
Image synthesis via Generative Adversarial Networks (GANs) of three-dimensional (3D) medical images has great potential that can be extended to many medical applications, such as, image enhancement and disease progression modeling. However, current GAN technologies for 3D medical image synthesis need to be significantly improved to be readily adapted to real-world medical problems. In this paper, we extend the state-of-the-art StyleGAN2 model, which natively works with two-dimensional images, to enable 3D image synthesis. In addition to the image synthesis, we investigate the controllability and interpretability of the 3D-StyleGAN via style vectors inherited form the original StyleGAN2 that are highly suitable for medical applications: (i) the latent space projection and reconstruction of unseen real images, and (ii) style mixing. We demonstrate the 3D-StyleGAN's performance and feasibility with ~12,000 three-dimensional full brain MR T1 images, although it can be applied to any 3D volumetric images. Furthermore, we explore different configurations of hyperparameters to investigate potential improvement of the image synthesis with larger networks. The codes and pre-trained networks are available online: https://github.com/sh4174/3DStyleGAN.
IVApr 4, 2024
GaSpCT: Gaussian Splatting for Novel CT Projection View SynthesisEmmanouil Nikolakakis, Utkarsh Gupta, Jonathan Vengosh et al.
We present GaSpCT, a novel view synthesis and 3D scene representation method used to generate novel projection views for Computer Tomography (CT) scans. We adapt the Gaussian Splatting framework to enable novel view synthesis in CT based on limited sets of 2D image projections and without the need for Structure from Motion (SfM) methodologies. Therefore, we reduce the total scanning duration and the amount of radiation dose the patient receives during the scan. We adapted the loss function to our use-case by encouraging a stronger background and foreground distinction using two sparsity promoting regularizers: a beta loss and a total variation (TV) loss. Finally, we initialize the Gaussian locations across the 3D space using a uniform prior distribution of where the brain's positioning would be expected to be within the field of view. We evaluate the performance of our model using brain CT scans from the Parkinson's Progression Markers Initiative (PPMI) dataset and demonstrate that the rendered novel views closely match the original projection views of the simulated scan, and have better performance than other implicit 3D scene representations methodologies. Furthermore, we empirically observe reduced training time compared to neural network based image synthesis for sparse-view CT image reconstruction. Finally, the memory requirements of the Gaussian Splatting representations are reduced by 17% compared to the equivalent voxel grid image representations.
LGMar 21, 2025
Malliavin Calculus for Score-based Diffusion ModelsEhsan Mirafzali, Utkarsh Gupta, Patrick Wyrod et al.
We introduce a new framework based on Malliavin calculus to derive exact analytical expressions for the score function $\nabla \log p_t(x)$, i.e., the gradient of the log-density associated with the solution to stochastic differential equations (SDEs). Our approach combines classical integration-by-parts techniques with modern stochastic analysis tools, such as Bismut's formula and Malliavin calculus, and it works for both linear and nonlinear SDEs. In doing so, we establish a rigorous connection between the Malliavin derivative, its adjoint, the Malliavin divergence (Skorokhod integral), and diffusion generative models, thereby providing a systematic method for computing $\nabla \log p_t(x)$. In the linear case, we present a detailed analysis showing that our formula coincides with the analytical score function derived from the solution of the Fokker--Planck equation. For nonlinear SDEs with state-independent diffusion coefficients, we derive a closed-form expression for $\nabla \log p_t(x)$. We evaluate the proposed framework across multiple generative tasks and find that its performance is comparable to state-of-the-art methods. These results can be generalised to broader classes of SDEs, paving the way for new score-based diffusion generative models.
MLJul 8, 2025
A Malliavin calculus approach to score functions in diffusion generative modelsEhsan Mirafzali, Frank Proske, Utkarsh Gupta et al.
Score-based diffusion generative models have recently emerged as a powerful tool for modelling complex data distributions. These models aim at learning the score function, which defines a map from a known probability distribution to the target data distribution via deterministic or stochastic differential equations (SDEs). The score function is typically estimated from data using a variety of approximation techniques, such as denoising or sliced score matching, Hyvärien's method, or Schrödinger bridges. In this paper, we derive an exact, closed-form, expression for the score function for a broad class of nonlinear diffusion generative models. Our approach combines modern stochastic analysis tools such as Malliavin derivatives and their adjoint operators (Skorokhod integrals or Malliavin Divergence) with a new Bismut-type formula. The resulting expression for the score function can be written entirely in terms of the first and second variation processes, with all Malliavin derivatives systematically eliminated, thereby enhancing its practical applicability. The theoretical framework presented in this work offers a principled foundation for advancing score estimation methods in generative modelling, enabling the design of new sampling algorithms for complex probability distributions. Our results can be extended to broader classes of stochastic differential equations, opening new directions for the development of score-based diffusion generative models.
LGSep 18, 2025
TICA-Based Free Energy Matching for Machine-Learned Molecular DynamicsAlexander Aghili, Andy Bruce, Daniel Sabo et al.
Molecular dynamics (MD) simulations provide atomistic insight into biomolecular systems but are often limited by high computational costs required to access long timescales. Coarse-grained machine learning models offer a promising avenue for accelerating sampling, yet conventional force matching approaches often fail to capture the full thermodynamic landscape as fitting a model on the gradient may not fit the absolute differences between low-energy conformational states. In this work, we incorporate a complementary energy matching term into the loss function. We evaluate our framework on the Chignolin protein using the CGSchNet model, systematically varying the weight of the energy loss term. While energy matching did not yield statistically significant improvements in accuracy, it revealed distinct tendencies in how models generalize the free energy surface. Our results suggest future opportunities to enhance coarse-grained modeling through improved energy estimation techniques and multi-modal loss formulations.
SDMay 8, 2024
AFEN: Respiratory Disease Classification using Ensemble LearningRahul Nadkarni, Emmanouil Nikolakakis, Razvan Marinescu
We present AFEN (Audio Feature Ensemble Learning), a model that leverages Convolutional Neural Networks (CNN) and XGBoost in an ensemble learning fashion to perform state-of-the-art audio classification for a range of respiratory diseases. We use a meticulously selected mix of audio features which provide the salient attributes of the data and allow for accurate classification. The extracted features are then used as an input to two separate model classifiers 1) a multi-feature CNN classifier and 2) an XGBoost Classifier. The outputs of the two models are then fused with the use of soft voting. Thus, by exploiting ensemble learning, we achieve increased robustness and accuracy. We evaluate the performance of the model on a database of 920 respiratory sounds, which undergoes data augmentation techniques to increase the diversity of the data and generalizability of the model. We empirically verify that AFEN sets a new state-of-the-art using Precision and Recall as metrics, while decreasing training time by 60%.
CVMar 4
Spinverse: Differentiable Physics for Permeability-Aware Microstructure Reconstruction from Diffusion MRIPrathamesh Pradeep Khole, Mario M. Brenes, Zahra Kais Petiwala et al.
Diffusion MRI (dMRI) is sensitive to microstructural barriers, yet most existing methods either assume impermeable boundaries or estimate voxel-level parameters without recovering explicit interfaces. We present Spinverse, a permeability-aware reconstruction method that inverts dMRI measurements through a fully differentiable Bloch-Torrey simulator. Spinverse represents tissue on a fixed tetrahedral grid and treats each interior face permeability as a learnable parameter; low-permeability faces act as diffusion barriers, so microstructural boundaries whose topology is not fixed a priori (up to the resolution of the ambient mesh) emerge without changing mesh connectivity or vertex positions. Given a target signal, we optimize face permeabilities by backpropagating a signal-matching loss through the PDE forward model, and recover an interface by thresholding the learned permeability field. To mitigate the ill-posedness of permeability inversion, we use mesh-based geometric priors; to avoid local minima, we use a staged multi-sequence optimization curriculum. Across a collection of synthetic voxel meshes, Spinverse reconstructs diverse geometries and demonstrates that sequence scheduling and regularization are critical to avoid outline-only solutions while improving both boundary accuracy and structural validity.
LGOct 13, 2025
Medical Interpretability and Knowledge Maps of Large Language ModelsRazvan Marinescu, Victoria-Elisabeth Gruber, Diego Fajardo
We present a systematic study of medical-domain interpretability in Large Language Models (LLMs). We study how the LLMs both represent and process medical knowledge through four different interpretability techniques: (1) UMAP projections of intermediate activations, (2) gradient-based saliency with respect to the model weights, (3) layer lesioning/removal and (4) activation patching. We present knowledge maps of five LLMs which show, at a coarse-resolution, where knowledge about patient's ages, medical symptoms, diseases and drugs is stored in the models. In particular for Llama3.3-70B, we find that most medical knowledge is processed in the first half of the model's layers. In addition, we find several interesting phenomena: (i) age is often encoded in a non-linear and sometimes discontinuous manner at intermediate layers in the models, (ii) the disease progression representation is non-monotonic and circular at certain layers of the model, (iii) in Llama3.3-70B, drugs cluster better by medical specialty rather than mechanism of action, especially for Llama3.3-70B and (iv) Gemma3-27B and MedGemma-27B have activations that collapse at intermediate layers but recover by the final layers. These results can guide future research on fine-tuning, un-learning or de-biasing LLMs for medical tasks by suggesting at which layers in the model these techniques should be applied.
LGSep 21, 2025
Active Learning for Machine Learning Driven Molecular DynamicsKevin Bachelor, Sanya Murdeshwar, Daniel Sabo et al.
Machine-learned coarse-grained (CG) potentials are fast, but degrade over time when simulations reach under-sampled bio-molecular conformations, and generating widespread all-atom (AA) data to combat this is computationally infeasible. We propose a novel active learning (AL) framework for CG neural network potentials in molecular dynamics (MD). Building on the CGSchNet model, our method employs root mean squared deviation (RMSD)-based frame selection from MD simulations in order to generate data on-the-fly by querying an oracle during the training of a neural network potential. This framework preserves CG-level efficiency while correcting the model at precise, RMSD-identified coverage gaps. By training CGSchNet, a coarse-grained neural network potential, we empirically show that our framework explores previously unseen configurations and trains the model on unexplored regions of conformational space. Our active learning framework enables a CGSchNet model trained on the Chignolin protein to achieve a 33.05\% improvement in the Wasserstein-1 (W1) metric in Time-lagged Independent Component Analysis (TICA) space on an in-house benchmark suite.