CVApr 12, 2023Code
Scale-Equivariant Deep Learning for 3D DataThomas Wimmer, Vladimir Golkov, Hoai Nam Dang et al. · eth-zurich
The ability of convolutional neural networks (CNNs) to recognize objects regardless of their position in the image is due to the translation-equivariance of the convolutional operation. Group-equivariant CNNs transfer this equivariance to other transformations of the input. Dealing appropriately with objects and object parts of different scale is challenging, and scale can vary for multiple reasons such as the underlying object size or the resolution of the imaging modality. In this paper, we propose a scale-equivariant convolutional network layer for three-dimensional data that guarantees scale-equivariance in 3D CNNs. Scale-equivariance lifts the burden of having to learn each possible scale separately, allowing the neural network to focus on higher-level learning goals, which leads to better results and better data-efficiency. We provide an overview of the theoretical foundations and scientific work on scale-equivariant neural networks in the two-dimensional domain. We then transfer the concepts from 2D to the three-dimensional space and create a scale-equivariant convolutional layer for 3D data. Using the proposed scale-equivariant layer, we create a scale-equivariant U-Net for medical image segmentation and compare it with a non-scale-equivariant baseline method. Our experiments demonstrate the effectiveness of the proposed method in achieving scale-equivariance for 3D medical image analysis. We publish our code at https://github.com/wimmerth/scale-equivariant-3d-convnet for further research and application.
MED-PHSep 30, 2024Code
Controlling sharpness, SNR and SAR for 3D FSE at 7T by end-to-end learningPeter Dawood, Martin Blaimer, Jürgen Herrler et al.
Purpose: To non-heuristically identify dedicated variable flip angle (VFA) schemes optimized for the point-spread function (PSF) and signal-to-noise ratio (SNR) of multiple tissues in 3D FSE sequences with very long echo trains at 7T. Methods: The proposed optimization considers predefined SAR constraints and target contrast using an end-to-end learning framework. The cost function integrates components for contrast fidelity (SNR) and a penalty term to minimize image blurring (PSF) for multiple tissues. By adjusting the weights of PSF/SNR cost-function components, PSF- and SNR-optimized VFAs were derived and tested in vivo using both the open-source Pulseq standard on two volunteers as well as vendor protocols on a 7T MRI system with parallel transmit extension on three volunteers. Results: PSF-optimized VFAs resulted in significantly reduced image blurring compared to standard VFAs for T2w while maintaining contrast fidelity. Small white and gray matter structures, as well as blood vessels, are more visible with PSF-optimized VFAs. Quantitative analysis shows that the optimized VFA yields 50% less deviation from a sinc-like reference PSF than the standard VFA. The SNR-optimized VFAs yielded images with significantly improved SNR in a white and gray matter region relative to standard (81.2\pm18.4 vs. 41.2\pm11.5, respectively) as trade-off for elevated image blurring. Conclusion: This study demonstrates the potential of end-to-end learning frameworks to optimize VFA schemes in very long echo trains for 3D FSE acquisition at 7T in terms of PSF and SNR. It paves the way for fast and flexible adjustment of the trade-off between PSF and SNR for 3D FSE.
MED-PHJul 22, 2022
Accelerated and Quantitative 3D Semisolid MT/CEST Imaging using a Generative Adversarial Network (GAN-CEST)Jonah Weigand-Whittier, Maria Sedykh, Kai Herz et al.
Purpose: To substantially shorten the acquisition time required for quantitative 3D chemical exchange saturation transfer (CEST) and semisolid magnetization transfer (MT) imaging and allow for rapid chemical exchange parameter map reconstruction. Methods: Three-dimensional CEST and MT magnetic resonance fingerprinting (MRF) datasets of L-arginine phantoms, whole-brains, and calf muscles from healthy volunteers, cancer patients, and cardiac patients were acquired using 3T clinical scanners at 3 different sites, using 3 different scanner models and coils. A generative adversarial network supervised framework (GAN-CEST) was then designed and trained to learn the mapping from a reduced input data space to the quantitative exchange parameter space, while preserving perceptual and quantitative content. Results: The GAN-CEST 3D acquisition time was 42-52 seconds, 70% shorter than CEST-MRF. The quantitative reconstruction of the entire brain took 0.8 seconds. An excellent agreement was observed between the ground truth and GAN-based L-arginine concentration and pH values (Pearson's r > 0.97, NRMSE < 1.5%). GAN-CEST images from a brain-tumor subject yielded a semi-solid volume fraction and exchange rate NRMSE of 3.8$\pm$1.3% and 4.6$\pm$1.3%, respectively, and SSIM of 96.3$\pm$1.6% and 95.0$\pm$2.4%, respectively. The mapping of the calf-muscle exchange parameters in a cardiac patient, yielded NRMSE < 7% and SSIM > 94% for the semi-solid exchange parameters. In regions with large susceptibility artifacts, GAN-CEST has demonstrated improved performance and reduced noise compared to MRF. Conclusion: GAN-CEST can substantially reduce the acquisition time for quantitative semisolid MT/CEST mapping, while retaining performance even when facing pathologies and scanner models that were not available during training.
35.0SDMay 18Code
SIREM: Speech-Informed MRI Reconstruction with Learned SamplingMd Hasan, Nyvenn Castro, Daiqi Liu et al.
Real-time magnetic resonance imaging (rtMRI) of speech production enables non-invasive visualization of dynamic vocal-tract motion and is valuable for speech science and clinical assessment. However, rtMRI is fundamentally constrained by trade-offs among spatial resolution, temporal resolution, and acquisition speed, often leading to undersampled k-space measurements and degraded reconstructions. We propose SIREM, a speech-informed MRI reconstruction framework that uses synchronized speech as a cross-modal prior. The central idea is that vocal-tract configurations during speech are correlated with the produced acoustics, making part of the image content predictable from audio. SIREM models each frame as a fusion of an audio-driven component and an MRI-driven component through a spatial weighting map. The audio branch predicts articulator-related structure from speech, while the MRI branch reconstructs complementary content from measured k-space data. We further introduce a learnable soft weighting profile over spiral arms, enabling a differentiable study of how k-space arm usage interacts with speech-informed fusion. This yields a unified multimodal formulation that combines audio-driven prediction, MRI reconstruction, and sampling adaptation. We evaluate SIREM on the USC speech rtMRI benchmark against standard baselines, including gridding, wavelet-based compressed sensing, and total variation. SIREM introduces a speech-informed reconstruction paradigm that operates in a substantially higher-throughput regime than iterative methods while preserving anatomically plausible vocal-tract structure. These results establish an initial benchmark for multimodal speech-informed rtMRI reconstruction and highlight the potential of synchronized speech as an auxiliary prior for fast reconstruction. The source code is available at https://github.com/mdhasanai/SIREM
MED-PHAug 15, 2024
Decoding the human brain tissue response to radiofrequency excitation using a biophysical-model-free deep MRI on a chip frameworkDinor Nagar, Moritz Zaiss, Or Perlman
Magnetic resonance imaging (MRI) relies on radiofrequency (RF) excitation of proton spin. Clinical diagnosis requires a comprehensive collation of biophysical data via multiple MRI contrasts, acquired using a series of RF sequences that lead to lengthy examinations. Here, we developed a vision transformer-based framework that captures the spatiotemporal magnetic signal evolution and decodes the brain tissue response to RF excitation, constituting an MRI on a chip. Following a per-subject rapid calibration scan (28.2 s), a wide variety of image contrasts including fully quantitative molecular, water relaxation, and magnetic field maps can be generated automatically. The method was validated across healthy subjects and a cancer patient in two different imaging sites, and proved to be 94% faster than alternative protocols. The deep MRI on a chip (DeepMonC) framework may reveal the molecular composition of the human brain tissue in a wide range of pathologies, while offering clinically attractive scan times.
MLFeb 3
Multiparameter Uncertainty Mapping in Quantitative Molecular MRI using a Physics-Structured Variational Autoencoder (PS-VAE)Alex Finkelstein, Ron Moneta, Or Zohar et al.
Quantitative imaging methods, such as magnetic resonance fingerprinting (MRF), aim to extract interpretable pathology biomarkers by estimating biophysical tissue parameters from signal evolutions. However, the pattern-matching algorithms or neural networks used in such inverse problems often lack principled uncertainty quantification, which limits the trustworthiness and transparency, required for clinical acceptance. Here, we describe a physics-structured variational autoencoder (PS-VAE) designed for rapid extraction of voxelwise multi-parameter posterior distributions. Our approach integrates a differentiable spin physics simulator with self-supervised learning, and provides a full covariance that captures the inter-parameter correlations of the latent biophysical space. The method was validated in a multi-proton pool chemical exchange saturation transfer (CEST) and semisolid magnetization transfer (MT) molecular MRF study, across in-vitro phantoms, tumor-bearing mice, healthy human volunteers, and a subject with glioblastoma. The resulting multi-parametric posteriors are in good agreement with those calculated using a brute-force Bayesian analysis, while providing an orders-of-magnitude acceleration in whole brain quantification. In addition, we demonstrate how monitoring the multi-parameter posterior dynamics across progressively acquired signals provides practical insights for protocol optimization and may facilitate real-time adaptive acquisition.
CVApr 23, 2024Code
BMapEst: Estimation of Brain Tissue Probability Maps using a Differentiable MRI SimulatorUtkarsh Gupta, Emmanouil Nikolakakis, Moritz Zaiss et al.
Reconstructing digital brain phantoms in the form of voxel-based, multi-channeled tissue probability maps for individual subjects is essential for capturing brain anatomical variability, understanding neurological diseases, as well as for testing image processing methods. We demonstrate the first framework that estimates brain tissue probability maps (Grey Matter - GM, White Matter - WM, and Cerebrospinal fluid - CSF) with the help of a Physics-based differentiable MRI simulator that models the magnetization signal at each voxel in the volume. Given an observed $T_1$/$T_2$-weighted MRI scan, the corresponding clinical MRI sequence, and the MRI differentiable simulator, we estimate the simulator's input probability maps by back-propagating the L2 loss between the simulator's output and the $T_1$/$T_2$-weighted scan. This approach has the significant advantage of not relying on any training data and instead uses the strong inductive bias of the MRI simulator. We tested the model on 20 scans from the BrainWeb database and demonstrated a highly accurate reconstruction of GM, WM, and CSF. Our source code is available online: https://github.com/BioMedAI-UCSC/BMapEst.
17.0CLMay 4
Beating the Style Detector: Three Hours of Agentic Research on the AI-Text Arms RaceAndreas Maier, Moritz Zaiss, Siming Bayer
Reproducing an empirical NLP study used to take weeks. Given the released data and a modern agentic-research harness, we redo every experiment of a recent ACL\,2026 study on personal-style post-editing of LLM drafts -- and add three new ones -- with the human investigator acting only as a reviewer-in-the-loop. We reproduce all seven preregistered hypotheses and recover the paper's headline correlation between perceived self-similarity and embedding-measured self-similarity to three decimal places ($r{=}{+}0.244$, $p{<}10^{-8}$, $n{=}648$). Under a leakage-free held-out protocol, GPT-5.5 and Claude\,Opus\,4.7 close $71$--$75\,\%$ of the style gap to the same-author ceiling on $324$ paired tasks, against $24\,\%$ for the human post-edit, and beat the human post-edit on $\sim$$80\,\%$ of tasks. We then frame the same data as an AI-text detection arms race. A leave-authors-out linear SVM on LUAR-MUD embeddings reaches AUC $0.93$--$1.00$ across approaches; six diagnostics show that GPT-5.5 detection is mostly a length confound while Opus detection is a genuine stylistic signature. Given $T{=}20$ feedback iterations against the frozen detector, an Opus agent flips two of five held-out test mimics to the human half-space and shrinks every margin by an order of magnitude. With moderate effort against a known detector, a frontier LLM can already efficiently lower its own AI-detection probability. All code, $648$ mimic drafts, trained detectors, diagnostics, and adversarial trajectories are released.
MED-PHNov 10, 2024
Multi-Parameter Molecular MRI Quantification using Physics-Informed Self-Supervised LearningAlex Finkelstein, Nikita Vladimirov, Moritz Zaiss et al.
Biophysical model fitting plays a key role in obtaining quantitative parameters from physiological signals and images. However, the model complexity for molecular magnetic resonance imaging (MRI) often translates into excessive computation time, which makes clinical use impractical. Here, we present a generic computational approach for solving the parameter extraction inverse problem posed by ordinary differential equation (ODE) modeling coupled with experimental measurement of the system dynamics. This is achieved by formulating a numerical ODE solver to function as a step-wise analytical one, thereby making it compatible with automatic differentiation-based optimization. This enables efficient gradient-based model fitting, and provides a new approach to parameter quantification based on self-supervised learning from a single data observation. The neural-network-based train-by-fit pipeline was used to quantify semisolid magnetization transfer (MT) and chemical exchange saturation transfer (CEST) amide proton exchange parameters in the human brain, in an in-vivo molecular MRI study (n = 4). The entire pipeline of the first whole brain quantification was completed in 18.3 $\pm$ 8.3 minutes. Reusing the single-subject-trained network for inference in new subjects took 1.0 $\pm$ 0.2 s, to provide results in agreement with literature values and scan-specific fit results.
CVFeb 27, 2024
Image space formalism of convolutional neural networks for k-space interpolationPeter Dawood, Felix Breuer, Istvan Homolya et al.
Purpose: Noise resilience in image reconstructions by scan-specific robust artificial neural networks for k-space interpolation (RAKI) is linked to nonlinear activations in k-space. To gain a deeper understanding of this relationship, an image space formalism of RAKI is introduced for analyzing noise propagation analytically, identifying and characterizing image reconstruction features and to describe the role of nonlinear activations in a human readable manner. Methods: The image space formalism for RAKI inference is employed by expressing nonlinear activations in k-space as element-wise multiplications with activation masks, which transform into convolutions in image space. Jacobians of the de-aliased, coil-combined image relative to the aliased coil images can be expressed algebraically, and thus, the noise amplification is quantified analytically (g-factor maps). We analyze the role of nonlinearity for noise resilience by controlling the degree of nonlinearity in the reconstruction model via the negative slope parameter in leaky ReLU. Results: The analytical g-factor maps correspond with those obtained from Monte Carlo simulations and from an auto differentiation approach for in vivo brain images. Apparent blurring and contrast loss artifacts are identified as implications of enhanced noise resilience. These residual artifacts can be traded against noise resilience by adjusting the degree of nonlinearity in the model (Tikhonov-like regularization) in case of limited training data. The inspection of image space activations reveals an autocorrelation pattern leading to a potential center artifact. Conclusion: The image space formalism of RAKI provides the means for analytical quantitative noisepropagation analysis and human-readable visualization of the effects of the nonlinear activation functions in k-space.
MED-PHMay 12, 2023
Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolutionHoai Nam Dang, Vladimir Golkov, Thomas Wimmer et al.
Current MRI super-resolution (SR) methods only use existing contrasts acquired from typical clinical sequences as input for the neural network (NN). In turbo spin echo sequences (TSE) the sequence parameters can have a strong influence on the actual resolution of the acquired image and have consequently a considera-ble impact on the performance of the NN. We propose a known-operator learning approach to perform an end-to-end optimization of MR sequence and neural net-work parameters for SR-TSE. This MR-physics-informed training procedure jointly optimizes the radiofrequency pulse train of a proton density- (PD-) and T2-weighted TSE and a subsequently applied convolutional neural network to predict the corresponding PDw and T2w super-resolution TSE images. The found radiofrequency pulse train designs generate an optimal signal for the NN to perform the SR task. Our method generalizes from the simulation-based optimi-zation to in vivo measurements and the acquired physics-informed SR images show higher correlation with a time-consuming segmented high-resolution TSE sequence compared to a pure network training approach.