LGMar 14, 2023
Tensor-based Multimodal Learning for Prediction of Pulmonary Arterial Wedge Pressure from Cardiac MRIPrasun C. Tripathi, Mohammod N. I. Suvon, Lawrence Schobs et al.
Heart failure is a serious and life-threatening condition that can lead to elevated pressure in the left ventricle. Pulmonary Arterial Wedge Pressure (PAWP) is an important surrogate marker indicating high pressure in the left ventricle. PAWP is determined by Right Heart Catheterization (RHC) but it is an invasive procedure. A non-invasive method is useful in quickly identifying high-risk patients from a large population. In this work, we develop a tensor learning-based pipeline for identifying PAWP from multimodal cardiac Magnetic Resonance Imaging (MRI). This pipeline extracts spatial and temporal features from high-dimensional scans. For quality control, we incorporate an epistemic uncertainty-based binning strategy to identify poor-quality training samples. To improve the performance, we learn complementary information by integrating features from multimodal data: cardiac MRI with short-axis and four-chamber views, and Electronic Health Records. The experimental analysis on a large cohort of $1346$ subjects who underwent the RHC procedure for PAWP estimation indicates that the proposed pipeline has a diagnostic value and can produce promising performance with significant improvement over the baseline in clinical practice (i.e., $Δ$AUC $=0.10$, $Δ$Accuracy $=0.06$, and $Δ$MCC $=0.39$). The decision curve analysis further confirms the clinical utility of our method.
IVAug 8, 2024
Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural NetworkLing Lin, Yihang Zhou, Zhanqi Hu et al.
Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-layer quantum layer (QL), comprising ZZFeatureMap and Ansatz layers, strategically designed for processing classical data within a quantum framework. A comprehensive evaluation, demonstrates the superior performance of QResNet in TSC MRI image classification compared to conventional 3D-ResNet models. These compelling findings underscore the potential of quantum computing to revolutionize medical imaging and diagnostics.Remarkably, this method surpasses conventional CNNs in accuracy and Area Under the Curve (AUC) metrics with the current dataset. Future research endeavors may focus on exploring the scalability and practical implementation of quantum algorithms in real-world medical imaging scenarios.
LGFeb 17, 2025Code
Classifying the Stoichiometry of Virus-like Particles with Interpretable Machine LearningJiayang Zhang, Xianyuan Liu, Wei Wu et al.
Virus-like particles (VLPs) are valuable for vaccine development due to their immune-triggering properties. Understanding their stoichiometry, the number of protein subunits to form a VLP, is critical for vaccine optimisation. However, current experimental methods to determine stoichiometry are time-consuming and require highly purified proteins. To efficiently classify stoichiometry classes in proteins, we curate a new dataset and propose an interpretable, data-driven pipeline leveraging linear machine learning models. We also explore the impact of feature encoding on model performance and interpretability, as well as methods to identify key protein sequence features influencing classification. The evaluation of our pipeline demonstrates that it can classify stoichiometry while revealing protein features that possibly influence VLP assembly. The data and code used in this work are publicly available at https://github.com/Shef-AIRE/StoicIML.
CVApr 6, 2024Code
Interpretable Multimodal Learning for Cardiovascular Hemodynamics AssessmentPrasun C Tripathi, Sina Tabakhi, Mohammod N I Suvon et al.
Pulmonary Arterial Wedge Pressure (PAWP) is an essential cardiovascular hemodynamics marker to detect heart failure. In clinical practice, Right Heart Catheterization is considered a gold standard for assessing cardiac hemodynamics while non-invasive methods are often needed to screen high-risk patients from a large population. In this paper, we propose a multimodal learning pipeline to predict PAWP marker. We utilize complementary information from Cardiac Magnetic Resonance Imaging (CMR) scans (short-axis and four-chamber) and Electronic Health Records (EHRs). We extract spatio-temporal features from CMR scans using tensor-based learning. We propose a graph attention network to select important EHR features for prediction, where we model subjects as graph nodes and feature relationships as graph edges using the attention mechanism. We design four feature fusion strategies: early, intermediate, late, and hybrid fusion. With a linear classifier and linear fusion strategies, our pipeline is interpretable. We validate our pipeline on a large dataset of $2,641$ subjects from our ASPIRE registry. The comparative study against state-of-the-art methods confirms the superiority of our pipeline. The decision curve analysis further validates that our pipeline can be applied to screen a large population. The code is available at https://github.com/prasunc/hemodynamics.
CVMar 15, 2024Code
MeDSLIP: Medical Dual-Stream Language-Image Pre-training with Pathology-Anatomy Semantic AlignmentWenrui Fan, Mohammod N. I. Suvon, Shuo Zhou et al.
Pathology and anatomy are two essential groups of semantics in medical data. Pathology describes what the diseases are, while anatomy explains where the diseases occur. They describe diseases from different perspectives, providing complementary insights into diseases. Thus, properly understanding these semantics and their relationships can enhance medical vision-language models (VLMs). However, pathology and anatomy semantics are usually entangled in medical data, hindering VLMs from explicitly modeling these semantics and their relationships. To address this challenge, we propose MeDSLIP, a novel Medical Dual-Stream Language-Image Pre-training pipeline, to disentangle pathology and anatomy semantics and model the relationships between them. We introduce a dual-stream mechanism in MeDSLIP to explicitly disentangle medical semantics into pathology-relevant and anatomy-relevant streams and align visual and textual information within each stream. Furthermore, we propose an interaction modeling module with prototypical contrastive learning loss and intra-image contrastive learning loss to regularize the relationships between pathology and anatomy semantics. We apply MeDSLIP to chest X-ray analysis and conduct comprehensive evaluations with four benchmark datasets: NIH CXR14, RSNA Pneumonia, SIIM-ACR Pneumothorax, and COVIDx CXR-4. The results demonstrate MeDSLIP's superior generalizability and transferability across different scenarios. The code is available at https://github.com/Shef-AIRE/MeDSLIP, and the pre-trained model is released at https://huggingface.co/pykale/MeDSLIP.
QUANT-PHMar 30
Lindbladian Simulation with Commutator BoundsXinzhao Wang, Shuo Zhou, Xiaoyang Wang et al.
Trotter decomposition provides a simple approach to simulating open quantum systems by decomposing the Lindbladian into a sum of individual terms. While it is established that Trotter errors in Hamiltonian simulation depend on nested commutators of the summands, such a relationship remains poorly understood for Lindbladian dynamics. In this Letter, we derive commutator-based Trotter error bounds for Lindbladian simulation, yielding an $O(\sqrt{N})$ scaling in the number of Trotter steps for locally interacting systems on $N$ sites. When estimating observable averages, we apply Richardson extrapolation to achieve polylogarithmic precision while maintaining the commutator scaling. To bound the extrapolation remainder, we develop a general truncation bound for the Baker-Campbell-Hausdorff expansion that bypasses common convergence issues in physically relevant systems. For local Lindbladians, our results demonstrate that the Trotter-based methods outperform prior simulation techniques in system-size scaling while requiring only $O(1)$ ancillas. Numerical simulations further validate the predicted system-size and precision scaling.
LGOct 8, 2025Code
Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter OptimisationAryan Golbaghi, Shuo Zhou
We propose a workflow for speech emotion recognition (SER) that combines pre-trained representations with automated hyperparameter optimisation (HPO). Using SpeechBrain wav2vec2-base model fine-tuned on IEMOCAP as the encoder, we compare two HPO strategies, Gaussian Process Bayesian Optimisation (GP-BO) and Tree-structured Parzen Estimators (TPE), under an identical four-dimensional search space and 15-trial budget, with balanced class accuracy (BCA) on the German EmoDB corpus as the objective. All experiments run on 8 CPU cores with 32 GB RAM. GP-BO achieves 0.96 BCA in 11 minutes, and TPE (Hyperopt implementation) attains 0.97 in 15 minutes. In contrast, grid search requires 143 trials and 1,680 minutes to exceed 0.9 BCA, and the best AutoSpeech 2020 baseline reports only 0.85 in 30 minutes on GPU. For cross-lingual generalisation, an EmoDB-trained HPO-tuned model improves zero-shot accuracy by 0.25 on CREMA-D and 0.26 on RAVDESS. Results show that efficient HPO with pre-trained encoders delivers competitive SER on commodity CPUs. Source code to this work is available at: https://github.com/youngaryan/speechbrain-emotion-hpo.
SPMar 3, 2025Code
Multimodal Latent Fusion of ECG Leads for Early Assessment of Pulmonary HypertensionMohammod N. I. Suvon, Shuo Zhou, Prasun C. Tripathi et al.
Recent advancements in early assessment of pulmonary hypertension (PH) primarily focus on applying machine learning methods to centralized diagnostic modalities, such as 12-lead electrocardiogram (12L-ECG). Despite their potential, these approaches fall short in decentralized clinical settings, e.g., point-of-care and general practice, where handheld 6-lead ECG (6L-ECG) can offer an alternative but is limited by the scarcity of labeled data for developing reliable models. To address this, we propose a lead-specific electrocardiogram multimodal variational autoencoder (\textsc{LS-EMVAE}), which incorporates a hierarchical modality expert (HiME) fusion mechanism and a latent representation alignment loss. HiME combines mixture-of-experts and product-of-experts to enable flexible, adaptive latent fusion, while the alignment loss improves coherence among lead-specific and shared representations. To alleviate data scarcity and enhance representation learning, we adopt a transfer learning strategy: the model is first pre-trained on a large unlabeled 12L-ECG dataset and then fine-tuned on smaller task-specific labeled 6L-ECG datasets. We validate \textsc{LS-EMVAE} across two retrospective cohorts in a 6L-ECG setting: 892 subjects from the ASPIRE registry for (1) PH detection and (2) phenotyping pre-/post-capillary PH, and 16,416 subjects from UK Biobank for (3) predicting elevated pulmonary atrial wedge pressure, where it consistently outperforms unimodal and multimodal baseline methods and demonstrates strong generalizability and interpretability. The code is available at https://github.com/Shef-AIRE/LS-EMVAE.
NCJun 5, 2020Code
Neuropsychiatric Disease Classification Using Functional Connectomics -- Results of the Connectomics in NeuroImaging Transfer Learning ChallengeMarkus D. Schirmer, Archana Venkataraman, Islem Rekik et al.
Large, open-source consortium datasets have spurred the development of new and increasingly powerful machine learning approaches in brain connectomics. However, one key question remains: are we capturing biologically relevant and generalizable information about the brain, or are we simply overfitting to the data? To answer this, we organized a scientific challenge, the Connectomics in NeuroImaging Transfer Learning Challenge (CNI-TLC), held in conjunction with MICCAI 2019. CNI-TLC included two classification tasks: (1) diagnosis of Attention-Deficit/Hyperactivity Disorder (ADHD) within a pre-adolescent cohort; and (2) transference of the ADHD model to a related cohort of Autism Spectrum Disorder (ASD) patients with an ADHD comorbidity. In total, 240 resting-state fMRI time series averaged according to three standard parcellation atlases, along with clinical diagnosis, were released for training and validation (120 neurotypical controls and 120 ADHD). We also provided demographic information of age, sex, IQ, and handedness. A second set of 100 subjects (50 neurotypical controls, 25 ADHD, and 25 ASD with ADHD comorbidity) was used for testing. Models were submitted in a standardized format as Docker images through ChRIS, an open-source image analysis platform. Utilizing an inclusive approach, we ranked the methods based on 16 different metrics. The final rank was calculated using the rank product for each participant across all measures. Furthermore, we assessed the calibration curves of each method. Five participants submitted their model for evaluation, with one outperforming all other methods in both ADHD and ASD classification. However, further improvements are needed to reach the clinical translation of functional connectomics. We are keeping the CNI-TLC open as a publicly available resource for developing and validating new classification methodologies in the field of connectomics.
LGMar 20, 2024
Multimodal Variational Autoencoder for Low-cost Cardiac Hemodynamics Instability DetectionMohammod N. I. Suvon, Prasun C. Tripathi, Wenrui Fan et al.
Recent advancements in non-invasive detection of cardiac hemodynamic instability (CHDI) primarily focus on applying machine learning techniques to a single data modality, e.g. cardiac magnetic resonance imaging (MRI). Despite their potential, these approaches often fall short especially when the size of labeled patient data is limited, a common challenge in the medical domain. Furthermore, only a few studies have explored multimodal methods to study CHDI, which mostly rely on costly modalities such as cardiac MRI and echocardiogram. In response to these limitations, we propose a novel multimodal variational autoencoder ($\text{CardioVAE}_\text{X,G}$) to integrate low-cost chest X-ray (CXR) and electrocardiogram (ECG) modalities with pre-training on a large unlabeled dataset. Specifically, $\text{CardioVAE}_\text{X,G}$ introduces a novel tri-stream pre-training strategy to learn both shared and modality-specific features, thus enabling fine-tuning with both unimodal and multimodal datasets. We pre-train $\text{CardioVAE}_\text{X,G}$ on a large, unlabeled dataset of $50,982$ subjects from a subset of MIMIC database and then fine-tune the pre-trained model on a labeled dataset of $795$ subjects from the ASPIRE registry. Comprehensive evaluations against existing methods show that $\text{CardioVAE}_\text{X,G}$ offers promising performance (AUROC $=0.79$ and Accuracy $=0.77$), representing a significant step forward in non-invasive prediction of CHDI. Our model also excels in producing fine interpretations of predictions directly associated with clinical features, thereby supporting clinical decision-making.
AIApr 4, 2025
Towards deployment-centric multimodal AI beyond vision and languageXianyuan Liu, Jiayang Zhang, Shuo Zhou et al.
Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction, and decision-making across disciplines such as healthcare, science, and engineering. However, most multimodal AI advances focus on models for vision and language data, while their deployability remains a key challenge. We advocate a deployment-centric workflow that incorporates deployment constraints early to reduce the likelihood of undeployable solutions, complementing data-centric and model-centric approaches. We also emphasise deeper integration across multiple levels of multimodality and multidisciplinary collaboration to significantly broaden the research scope beyond vision and language. To facilitate this approach, we identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases: pandemic response, self-driving car design, and climate change adaptation, drawing expertise from healthcare, social science, engineering, science, sustainability, and finance. By fostering multidisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.
QMApr 4, 2025
Interpretable Multimodal Learning for Tumor Protein-Metal Binding: Progress, Challenges, and PerspectivesXiaokun Liu, Sayedmohammadreza Rastegari, Yijun Huang et al.
In cancer therapeutics, protein-metal binding mechanisms critically govern the pharmacokinetics and targeting efficacy of drugs, thereby fundamentally shaping the rational design of anticancer metallodrugs. While conventional laboratory methods used to study such mechanisms are often costly, low throughput, and limited in capturing dynamic biological processes, machine learning (ML) has emerged as a promising alternative. Despite increasing efforts to develop protein-metal binding datasets and ML algorithms, the application of ML in tumor protein-metal binding remains limited. Key challenges include a shortage of high-quality, tumor-specific datasets, insufficient consideration of multiple data modalities, and the complexity of interpreting results due to the ''black box'' nature of complex ML models. This paper summarizes recent progress and ongoing challenges in using ML to predict tumor protein-metal binding, focusing on data, modeling, and interpretability. We present multimodal protein-metal binding datasets and outline strategies for acquiring, curating, and preprocessing them for training ML models. Moreover, we explore the complementary value provided by different data modalities and examine methods for their integration. We also review approaches for improving model interpretability to support more trustworthy decisions in cancer research. Finally, we offer our perspective on research opportunities and propose strategies to address the scarcity of tumor protein data and the limited number of predictive models for tumor protein-metal binding. We also highlight two promising directions for effective metal-based drug design: integrating protein-protein interaction data to provide structural insights into metal-binding events and predicting structural changes in tumor proteins after metal binding.
LGFeb 28, 2025
Foundation-Model-Boosted Multimodal Learning for fMRI-based Neuropathic Pain Drug Response PredictionWenrui Fan, L. M. Riza Rizky, Jiayang Zhang et al.
Neuropathic pain, affecting up to 10% of adults, remains difficult to treat due to limited therapeutic efficacy and tolerability. Although resting-state functional MRI (rs-fMRI) is a promising non-invasive measurement of brain biomarkers to predict drug response in therapeutic development, the complexity of fMRI demands machine learning models with substantial capacity. However, extreme data scarcity in neuropathic pain research limits the application of high-capacity models. To address the challenge of data scarcity, we propose FMM$_{TC}$, a Foundation-Model-boosted Multimodal learning framework for fMRI-based neuropathic pain drug response prediction, which leverages both internal multimodal information in pain-specific data and external knowledge from large pain-agnostic data. Specifically, to maximize the value of limited pain-specific data, FMM$_{TC}$ integrates complementary information from two rs-fMRI modalities: Time series and functional Connectivity. FMM$_{TC}$ is further boosted by an fMRI foundation model with its external knowledge from extensive pain-agnostic fMRI datasets enriching limited pain-specific information. Evaluations with an in-house dataset and a public dataset from OpenNeuro demonstrate FMM$_{TC}$'s superior representation ability, generalizability, and cross-dataset adaptability over existing unimodal fMRI models that only consider one of the rs-fMRI modalities. The ablation study validates the effectiveness of multimodal learning and foundation-model-powered external knowledge transfer in FMM$_{TC}$. An integrated gradient-based interpretation study explains how FMM$_{TC}$'s cross-dataset dynamic behaviors enhance its adaptability. In conclusion, FMM$_{TC}$ boosts clinical trials in neuropathic pain therapeutic development by accurately predicting drug responses to improve the participant stratification efficiency.
NCApr 8, 2024
Group-specific discriminant analysis reveals statistically validated sex differences in lateralization of brain functional networkShuo Zhou, Junhao Luo, Yaya Jiang et al.
Lateralization is a fundamental feature of the human brain, where sex differences have been observed. Conventional studies in neuroscience on sex-specific lateralization are typically conducted on univariate statistical comparisons between male and female groups. However, these analyses often lack effective validation of group specificity. Here, we formulate modeling sex differences in lateralization of functional networks as a dual-classification problem, consisting of first-order classification for left vs. right functional networks and second-order classification for male vs. female models. To capture sex-specific patterns, we develop the Group-Specific Discriminant Analysis (GSDA) for first-order classification. The evaluation on two public neuroimaging datasets demonstrates the efficacy of GSDA in learning sex-specific models from functional networks, achieving a significant improvement in group specificity over baseline methods. The major sex differences are in the strength of lateralization and the interactions within and between lobes. The GSDA-based method is generic in nature and can be adapted to other group-specific analyses such as handedness-specific or disease-specific analyses.
LGFeb 28, 2022
A Machine Learning Method for Material Property Prediction: Example Polymer CompatibilityZhilong Liang, Zhiwei Li, Shuo Zhou et al.
Prediction of material property is a key problem because of its significance to material design and screening. We present a brand-new and general machine learning method for material property prediction. As a representative example, polymer compatibility is chosen to demonstrate the effectiveness of our method. Specifically, we mine data from related literature to build a specific database and give a prediction based on the basic molecular structures of blending polymers and, as auxiliary, the blending composition. Our model obtains at least 75% accuracy on the dataset consisting of thousands of entries. We demonstrate that the relationship between structure and properties can be learned and simulated by machine learning method.
CVAug 17, 2021
Channel-Temporal Attention for First-Person Video Domain AdaptationXianyuan Liu, Shuo Zhou, Tao Lei et al.
Unsupervised Domain Adaptation (UDA) can transfer knowledge from labeled source data to unlabeled target data of the same categories. However, UDA for first-person action recognition is an under-explored problem, with lack of datasets and limited consideration of first-person video characteristics. This paper focuses on addressing this problem. Firstly, we propose two small-scale first-person video domain adaptation datasets: ADL$_{small}$ and GTEA-KITCHEN. Secondly, we introduce channel-temporal attention blocks to capture the channel-wise and temporal-wise relationships and model their inter-dependencies important to first-person vision. Finally, we propose a Channel-Temporal Attention Network (CTAN) to integrate these blocks into existing architectures. CTAN outperforms baselines on the two proposed datasets and one existing dataset EPIC$_{cvpr20}$.
CVJun 27, 2021
A Behavior-aware Graph Convolution Network Model for Video RecommendationWei Zhuo, Kunchi Liu, Taofeng Xue et al.
Interactions between users and videos are the major data source of performing video recommendation. Despite lots of existing recommendation methods, user behaviors on videos, which imply the complex relations between users and videos, are still far from being fully explored. In the paper, we present a model named Sagittarius. Sagittarius adopts a graph convolutional neural network to capture the influence between users and videos. In particular, Sagittarius differentiates between different user behaviors by weighting and fuses the semantics of user behaviors into the embeddings of users and videos. Moreover, Sagittarius combines multiple optimization objectives to learn user and video embeddings and then achieves the video recommendation by the learned user and video embeddings. The experimental results on multiple datasets show that Sagittarius outperforms several state-of-the-art models in terms of recall, unique recall and NDCG.
CVJun 22, 2021
Team PyKale (xy9) Submission to the EPIC-Kitchens 2021 Unsupervised Domain Adaptation Challenge for Action RecognitionXianyuan Liu, Raivo Koot, Shuo Zhou et al.
This report describes the technical details of our submission to the EPIC-Kitchens 2021 Unsupervised Domain Adaptation Challenge for Action Recognition. The EPIC-Kitchens dataset is more difficult than other video domain adaptation datasets due to multi-tasks with more modalities. Firstly, to participate in the challenge, we employ a transformer to capture the spatial information from each modality. Secondly, we employ a temporal attention module to model temporal-wise inter-dependency. Thirdly, we employ the adversarial domain adaptation network to learn the general features between labeled source and unlabeled target domain. Finally, we incorporate multiple modalities to improve the performance by a three-stream network with late fusion. Our network achieves the comparable performance with the state-of-the-art baseline T$A^3$N and outperforms the baseline on top-1 accuracy for verb class and top-5 accuracies for all three tasks which are verb, noun and action. Under the team name xy9, our submission achieved 5th place in terms of top-1 accuracy for verb class and all top-5 accuracies.
LGJun 17, 2021
PyKale: Knowledge-Aware Machine Learning from Multiple Sources in PythonHaiping Lu, Xianyuan Liu, Robert Turner et al.
Machine learning is a general-purpose technology holding promises for many interdisciplinary research problems. However, significant barriers exist in crossing disciplinary boundaries when most machine learning tools are developed in different areas separately. We present Pykale - a Python library for knowledge-aware machine learning on graphs, images, texts, and videos to enable and accelerate interdisciplinary research. We formulate new green machine learning guidelines based on standard software engineering practices and propose a novel pipeline-based application programming interface (API). PyKale focuses on leveraging knowledge from multiple sources for accurate and interpretable prediction, thus supporting multimodal learning and transfer learning (particularly domain adaptation) with latest deep learning and dimensionality reduction models. We build PyKale on PyTorch and leverage the rich PyTorch ecosystem. Our pipeline-based API design enforces standardization and minimalism, embracing green machine learning concepts via reducing repetitions and redundancy, reusing existing resources, and recycling learning models across areas. We demonstrate its interdisciplinary nature via examples in bioinformatics, knowledge graph, image/video recognition, and medical imaging.
LGMar 26, 2019
Domain Independent SVM for Transfer Learning in Brain DecodingShuo Zhou, Wenwen Li, Christopher R. Cox et al.
Brain imaging data are important in brain sciences yet expensive to obtain, with big volume (i.e., large p) but small sample size (i.e., small n). To tackle this problem, transfer learning is a promising direction that leverages source data to improve performance on related, target data. Most transfer learning methods focus on minimizing data distribution mismatch. However, a big challenge in brain imaging is the large domain discrepancies in cognitive experiment designs and subject-specific structures and functions. A recent transfer learning approach minimizes domain dependence to learn common features across domains, via the Hilbert-Schmidt Independence Criterion (HSIC). Inspired by this method, we propose a new Domain Independent Support Vector Machine (DI-SVM) for transfer learning in brain condition decoding. Specifically, DI-SVM simultaneously minimizes the SVM empirical risk and the dependence on domain information via a simplified HSIC. We use public data to construct 13 transfer learning tasks in brain decoding, including three interesting multi-source transfer tasks. Experiments show that DI-SVM's superior performance over eight competing methods on these tasks, particularly an improvement of more than 24% on multi-source transfer tasks.
CVDec 4, 2018
Sturm: Sparse Tubal-Regularized Multilinear Regression for fMRIWenwen Li, Jian Lou, Shuo Zhou et al.
While functional magnetic resonance imaging (fMRI) is important for healthcare/neuroscience applications, it is challenging to classify or interpret due to its multi-dimensional structure, high dimensionality, and small number of samples available. Recent sparse multilinear regression methods based on tensor are emerging as promising solutions for fMRI, yet existing works rely on unfolding/folding operations and a tensor rank relaxation with limited tightness. The newly proposed tensor singular value decomposition (t-SVD) sheds light on new directions. In this work, we study t-SVD for sparse multilinear regression and propose a Sparse tubal-regularized multilinear regression (Sturm) method for fMRI. Specifically, the Sturm model performs multilinear regression with two regularization terms: a tubal tensor nuclear norm based on t-SVD and a standard L1 norm. We further derive the algorithm under the alternating direction method of multipliers framework. We perform experiments on four classification problems, including both resting-state fMRI for disease diagnosis and task-based fMRI for neural decoding. The results show the superior performance of Sturm in classifying fMRI using just a small number of voxels.
CVJun 7, 2018
Dimensionality-Driven Learning with Noisy LabelsXingjun Ma, Yisen Wang, Michael E. Houle et al.
Datasets with significant proportions of noisy (incorrect) class labels present challenges for training accurate Deep Neural Networks (DNNs). We propose a new perspective for understanding DNN generalization for such datasets, by investigating the dimensionality of the deep representation subspace of training samples. We show that from a dimensionality perspective, DNNs exhibit quite distinctive learning styles when trained with clean labels versus when trained with a proportion of noisy labels. Based on this finding, we develop a new dimensionality-driven learning strategy, which monitors the dimensionality of subspaces during training and adapts the loss function accordingly. We empirically demonstrate that our approach is highly tolerant to significant proportions of noisy labels, and can effectively learn low-dimensional local subspaces that capture the data distribution.
AIJun 30, 2017
Providing Effective Real-time Feedback in Simulation-based Surgical TrainingXingjun Ma, Sudanthi Wijewickrema, Yun Zhou et al.
Virtual reality simulation is becoming popular as a training platform in surgical education. However, one important aspect of simulation-based surgical training that has not received much attention is the provision of automated real-time performance feedback to support the learning process. Performance feedback is actionable advice that improves novice behaviour. In simulation, automated feedback is typically extracted from prediction models trained using data mining techniques. Existing techniques suffer from either low effectiveness or low efficiency resulting in their inability to be used in real-time. In this paper, we propose a random forest based method that finds a balance between effectiveness and efficiency. Experimental results in a temporal bone surgery simulation show that the proposed method is able to extract highly effective feedback at a high level of efficiency.
LGMar 4, 2017
Adversarial Generation of Real-time Feedback with Neural Networks for Simulation-based TrainingXingjun Ma, Sudanthi Wijewickrema, Shuo Zhou et al.
Simulation-based training (SBT) is gaining popularity as a low-cost and convenient training technique in a vast range of applications. However, for a SBT platform to be fully utilized as an effective training tool, it is essential that feedback on performance is provided automatically in real-time during training. It is the aim of this paper to develop an efficient and effective feedback generation method for the provision of real-time feedback in SBT. Existing methods either have low effectiveness in improving novice skills or suffer from low efficiency, resulting in their inability to be used in real-time. In this paper, we propose a neural network based method to generate feedback using the adversarial technique. The proposed method utilizes a bounded adversarial update to minimize a L1 regularized loss via back-propagation. We empirically show that the proposed method can be used to generate simple, yet effective feedback. Also, it was observed to have high effectiveness and efficiency when compared to existing methods, thus making it a promising option for real-time feedback generation in SBT.