CVJul 12, 2023
SwiFT: Swin 4D fMRI TransformerPeter Yongho Kim, Junbeom Kwon, Sunghwan Joo et al.
Modeling spatiotemporal brain dynamics from high-dimensional data, such as functional Magnetic Resonance Imaging (fMRI), is a formidable task in neuroscience. Existing approaches for fMRI analysis utilize hand-crafted features, but the process of feature extraction risks losing essential information in fMRI scans. To address this challenge, we present SwiFT (Swin 4D fMRI Transformer), a Swin Transformer architecture that can learn brain dynamics directly from fMRI volumes in a memory and computation-efficient manner. SwiFT achieves this by implementing a 4D window multi-head self-attention mechanism and absolute positional embeddings. We evaluate SwiFT using multiple large-scale resting-state fMRI datasets, including the Human Connectome Project (HCP), Adolescent Brain Cognitive Development (ABCD), and UK Biobank (UKB) datasets, to predict sex, age, and cognitive intelligence. Our experimental outcomes reveal that SwiFT consistently outperforms recent state-of-the-art models. Furthermore, by leveraging its end-to-end learning capability, we show that contrastive loss-based self-supervised pre-training of SwiFT can enhance performance on downstream tasks. Additionally, we employ an explainable AI method to identify the brain regions associated with sex classification. To our knowledge, SwiFT is the first Swin Transformer architecture to process dimensional spatiotemporal brain functional data in an end-to-end fashion. Our work holds substantial potential in facilitating scalable learning of functional brain imaging in neuroscience research by reducing the hurdles associated with applying Transformer models to high-dimensional fMRI.
NCJan 30
Recovering Whole-Brain Causal Connectivity under Indirect Observation with Applications to Human EEG and fMRISangyoon Bae, Miruna Oprescu, David Keetae Park et al.
Inferring directed connectivity from neuroimaging is an ill-posed inverse problem: recorded signals are distorted by hemodynamic filtering and volume conduction, which can mask true neural interactions. Many existing methods conflate these observation artifacts with genuine neural influence, risking spurious causal graphs driven by the measurement process. We introduce INCAMA (INdirect CAusal MAmba), a latent-space causal discovery framework that explicitly accounts for measurement physics to separate neural dynamics from indirect observations. INCAMA integrates a physics-aware inversion module with a nonstationarity-driven, delay-sensitive causal discovery model based on selective state-space sequences. Leveraging nonstationary mechanism shifts as soft interventions, we establish identifiability of delayed causal structure from indirect measurements and a stability bound that quantifies how inversion error affects graph recovery. We validate INCAMA on large-scale biophysical simulations across EEG and fMRI, where it significantly outperforms standard pipelines. We further demonstrate zero-shot generalization to real-world fMRI from the Human Connectome Project: without domain-specific fine-tuning, INCAMA recovers canonical visuo-motor pathways (e.g., $V1 \to V2$ and $M1 \leftrightarrow S1$) consistent with established neuroanatomy, supporting its use for whole-brain causal inference.
LGMay 8
PIMSM: Physics-Informed Multi-Scale Mamba for Stable Neural Representations under Distribution ShiftSangyoon Bae, Shinjae Yoo, Jiook Cha
Scientific foundation models are expected to reuse representations under changes in dataset, acquisition protocol, and deployment domain, yet many sequence backbones treat scientific temporal structure as an unconstrained pattern to be fitted. We argue that this misses a central property of natural dynamical systems: neural and atmospheric time series are organized by interacting processes across multiple physical timescales, and failure to preserve this multiscale structure contributes to brittleness under distribution shift. We formalize this failure mode as temporal kernel mismatch, where a model fits in-distribution dynamics with an effective memory policy that is not anchored to the signal's physical timescales, leading to representation drift and degraded transfer. We propose Physics-Informed Multi-Scale Mamba (PIMSM), a state-space architecture that maps spectrum-estimated transition points between frequency regimes (knee frequencies) to scale-specific discretization parameters and anchors them to acquisition time units. On Human Connectome Project fMRI, PIMSM improves robustness and representation stability under severe temporal-context truncation, extreme low-resource transfer, and resting-state-to-task-state generalization. Without modality-specific adaptation, the same architecture also attains the lowest variable-wise MAE across all reported horizons and variables on Weather-5K held-out-station spatial out-of-distribution forecasting. These results support temporal-scale alignment as a practical inductive bias for scientific foundation models that must preserve structure, not only fit correlations, under deployment shift.
CVJan 20
Attention-space Contrastive Guidance for Efficient Hallucination Mitigation in LVLMsYujin Jo, Sangyoon Bae, Taesup Kim
Hallucinations in large vision-language models (LVLMs) often arise when language priors dominate over visual evidence, causing object misidentification and visually inconsistent descriptions. We address this issue by framing hallucination mitigation as contrastive guidance, steering generation toward visually grounded and semantically faithful text. This approach regulates the model's internal behavior by reducing over-dependence on language priors and contrasting visually grounded with language-only representations. We propose Attention-space Contrastive Guidance (ACG), a single-pass mechanism that operates within self-attention layers to construct both vision-language and language-only attention paths in a single forward computation. This integration enables computationally efficient guidance directly embedded in the model's representation contextualization. To correct approximation bias introduced by the single-pass formulation, we further apply an orthogonalized correction that removes components aligned with the language-only path, selectively amplifying visual contributions. Experiments on the CHAIR and POPE benchmarks show that ACG achieves state-of-the-art faithfulness and caption quality while significantly reducing computational cost. Our method establishes a principled and efficient alternative, reducing latency by up to 2x compared to prior contrastive decoding methods that require multiple forward passes.
QUANT-PHJan 15, 2024
Quantum Privacy Aggregation of Teacher Ensembles (QPATE) for Privacy-preserving Quantum Machine LearningWilliam Watkins, Heehwan Wang, Sangyoon Bae et al.
The utility of machine learning has rapidly expanded in the last two decades and presents an ethical challenge. Papernot et. al. developed a technique, known as Private Aggregation of Teacher Ensembles (PATE) to enable federated learning in which multiple teacher models are trained on disjoint datasets. This study is the first to apply PATE to an ensemble of quantum neural networks (QNN) to pave a new way of ensuring privacy in quantum machine learning (QML) models.
LGNov 12, 2025
Transformer-Based Sleep Stage Classification Enhanced by Clinical InformationWoosuk Chung, Seokwoo Hong, Wonhyeok Lee et al.
Manual sleep staging from polysomnography (PSG) is labor-intensive and prone to inter-scorer variability. While recent deep learning models have advanced automated staging, most rely solely on raw PSG signals and neglect contextual cues used by human experts. We propose a two-stage architecture that combines a Transformer-based per-epoch encoder with a 1D CNN aggregator, and systematically investigates the effect of incorporating explicit context: subject-level clinical metadata (age, sex, BMI) and per-epoch expert event annotations (apneas, desaturations, arousals, periodic breathing). Using the Sleep Heart Health Study (SHHS) cohort (n=8,357), we demonstrate that contextual fusion substantially improves staging accuracy. Compared to a PSG-only baseline (macro-F1 0.7745, micro-F1 0.8774), our final model achieves macro-F1 0.8031 and micro-F1 0.9051, with event annotations contributing the largest gains. Notably, feature fusion outperforms multi-task alternatives that predict the same auxiliary labels. These results highlight that augmenting learned representations with clinically meaningful features enhances both performance and interpretability, without modifying the PSG montage or requiring additional sensors. Our findings support a practical and scalable path toward context-aware, expert-aligned sleep staging systems.
NCOct 21, 2025
Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware SSLSangyoon Bae, Mehdi Azabou, Jiook Cha et al. · gatech
Self-supervised learning (SSL) holds a great deal of promise for applications in neuroscience, due to the lack of large-scale, consistently labeled neural datasets. However, most neural datasets contain heterogeneous populations that mix stable, predictable cells with highly stochastic, stimulus-contingent ones, which has made it hard to identify consistent activity patterns during SSL. As a result, self-supervised pretraining has yet to show clear signs of benefits from scale on neural data. Here, we present a novel approach to self-supervised pretraining, POYO-SSL that exploits the heterogeneity of neural data to improve pre-training and achieve benefits of scale. Specifically, in POYO-SSL we pretrain only on predictable (statistically regular) neurons-identified on the pretraining split via simple higher-order statistics (skewness and kurtosis)-then we fine-tune on the unpredictable population for downstream tasks. On the Allen Brain Observatory dataset, this strategy yields approximately 12-13% relative gains over from-scratch training and exhibits smooth, monotonic scaling with model size. In contrast, existing state-of-the-art baselines plateau or destabilize as model size increases. By making predictability an explicit metric for crafting the data diet, POYO-SSL turns heterogeneity from a liability into an asset, providing a robust, biologically grounded recipe for scalable neural decoding and a path toward foundation models of neural dynamics.
CVOct 20, 2025
CausalMamba: Scalable Conditional State Space Models for Neural Causal InferenceSangyoon Bae, Jiook Cha
We introduce CausalMamba, a scalable framework that addresses fundamental limitations in fMRI-based causal inference: the ill-posed nature of inferring neural causality from hemodynamically distorted BOLD signals and the computational intractability of existing methods like Dynamic Causal Modeling (DCM). Our approach decomposes this complex inverse problem into two tractable stages: BOLD deconvolution to recover latent neural activity, followed by causal graph inference using a novel Conditional Mamba architecture. On simulated data, CausalMamba achieves 37% higher accuracy than DCM. Critically, when applied to real task fMRI data, our method recovers well-established neural pathways with 88% fidelity, whereas conventional approaches fail to identify these canonical circuits in over 99% of subjects. Furthermore, our network analysis of working memory data reveals that the brain strategically shifts its primary causal hub-recruiting executive or salience networks depending on the stimulus-a sophisticated reconfiguration that remains undetected by traditional methods. This work provides neuroscientists with a practical tool for large-scale causal inference that captures both fundamental circuit motifs and flexible network dynamics underlying cognitive function.
IVAug 31, 2025
Resting-state fMRI Analysis using Quantum Time-series TransformerJunghoon Justin Park, Jungwoo Seo, Sangyoon Bae et al.
Resting-state functional magnetic resonance imaging (fMRI) has emerged as a pivotal tool for revealing intrinsic brain network connectivity and identifying neural biomarkers of neuropsychiatric conditions. However, classical self-attention transformer models--despite their formidable representational power--struggle with quadratic complexity, large parameter counts, and substantial data requirements. To address these barriers, we introduce a Quantum Time-series Transformer, a novel quantum-enhanced transformer architecture leveraging Linear Combination of Unitaries and Quantum Singular Value Transformation. Unlike classical transformers, Quantum Time-series Transformer operates with polylogarithmic computational complexity, markedly reducing training overhead and enabling robust performance even with fewer parameters and limited sample sizes. Empirical evaluation on the largest-scale fMRI datasets from the Adolescent Brain Cognitive Development Study and the UK Biobank demonstrates that Quantum Time-series Transformer achieves comparable or superior predictive performance compared to state-of-the-art classical transformer models, with especially pronounced gains in small-sample scenarios. Interpretability analyses using SHapley Additive exPlanations further reveal that Quantum Time-series Transformer reliably identifies clinically meaningful neural biomarkers of attention-deficit/hyperactivity disorder (ADHD). These findings underscore the promise of quantum-enhanced transformers in advancing computational neuroscience by more efficiently modeling complex spatio-temporal dynamics and improving clinical interpretability.
NCMar 30, 2025
Spatiotemporal Learning of Brain Dynamics from fMRI Using Frequency-Specific Multi-Band Attention for Cognitive and Psychiatric ApplicationsSangyoon Bae, Junbeom Kwon, Shinjae Yoo et al.
Understanding how the brain's complex nonlinear dynamics give rise to cognitive function remains a central challenge in neuroscience. While brain functional dynamics exhibits scale-free and multifractal properties across temporal scales, conventional neuroimaging analytics assume linearity and stationarity, failing to capture frequency-specific neural computations. Here, we introduce Multi-Band Brain Net (MBBN), the first transformer-based framework to explicitly model frequency-specific spatiotemporal brain dynamics from fMRI. MBBN integrates biologically-grounded frequency decomposition with multi-band self-attention mechanisms, enabling discovery of previously undetectable frequency-dependent network interactions. Trained on 49,673 individuals across three large-scale cohorts (UK Biobank, ABCD, ABIDE), MBBN sets a new state-of-the-art in predicting psychiatric and cognitive outcomes (depression, ADHD, ASD), showing particular strength in classification tasks with up to 52.5\% higher AUROC and provides a novel framework for predicting cognitive intelligence scores. Frequency-resolved analyses uncover disorder-specific signatures: in ADHD, high-frequency fronto-sensorimotor connectivity is attenuated and opercular somatosensory nodes emerge as dynamic hubs; in ASD, orbitofrontal-somatosensory circuits show focal high-frequency disruption together with enhanced ultra-low-frequency coupling between the temporo-parietal junction and prefrontal cortex. By integrating scale-aware neural dynamics with deep learning, MBBN delivers more accurate and interpretable biomarkers, opening avenues for precision psychiatry and developmental neuroscience.
IVDec 15, 2024
Macro2Micro: A Rapid and Precise Cross-modal Magnetic Resonance Imaging Synthesis using Multi-scale Structural Brain SimilaritySooyoung Kim, Joonwoo Kwon, Junbeom Kwon et al.
The human brain is a complex system requiring both macroscopic and microscopic components for comprehensive understanding. However, mapping nonlinear relationships between these scales remains challenging due to technical limitations and the high cost of multimodal Magnetic Resonance Imaging (MRI) acquisition. To address this, we introduce Macro2Micro, a deep learning framework that predicts brain microstructure from macrostructure using a Generative Adversarial Network (GAN). Based on the hypothesis that microscale structural information can be inferred from macroscale structures, Macro2Micro explicitly encodes multiscale brain information into distinct processing branches. To enhance artifact elimination and output quality, we propose a simple yet effective auxiliary discriminator and learning objective. Extensive experiments demonstrated that Macro2Micro faithfully translates T1-weighted MRIs into corresponding Fractional Anisotropy (FA) images, achieving a 6.8\% improvement in the Structural Similarity Index Measure (SSIM) compared to previous methods, while retaining the individual biological characteristics of the brain. With an inference time of less than 0.01 seconds per MR modality translation, Macro2Micro introduces the potential for real-time multimodal rendering in medical and research applications. The code will be made available upon acceptance.