CVAug 29, 2023
Multimodal Contrastive Learning and Tabular Attention for Automated Alzheimer's Disease PredictionWeichen Huang
Alongside neuroimaging such as MRI scans and PET, Alzheimer's disease (AD) datasets contain valuable tabular data including AD biomarkers and clinical assessments. Existing computer vision approaches struggle to utilize this additional information. To address these needs, we propose a generalizable framework for multimodal contrastive learning of image data and tabular data, a novel tabular attention module for amplifying and ranking salient features in tables, and the application of these techniques onto Alzheimer's disease prediction. Experimental evaulations demonstrate the strength of our framework by detecting Alzheimer's disease (AD) from over 882 MR image slices from the ADNI database. We take advantage of the high interpretability of tabular data and our novel tabular attention approach and through attribution of the attention scores for each row of the table, we note and rank the most predominant features. Results show that the model is capable of an accuracy of over 83.8%, almost a 10% increase from previous state of the art.
NCDec 12, 2024
Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representationsYudi Xie, Weichen Huang, Esther Alter et al.
Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also derived under such an objective. Here, we explore an alternative hypothesis: Might the ventral stream be optimized for estimating spatial latents? And a closely related question: How different -- if at all -- are representations learned from spatial latent estimation compared to categorization? To ask these questions, we leveraged synthetic image datasets generated by a 3D graphic engine and trained convolutional neural networks (CNNs) to estimate different combinations of spatial and category latents. We found that models trained to estimate just a few spatial latents achieve neural alignment scores comparable to those trained on hundreds of categories, and the spatial latent performance of models strongly correlates with their neural alignment. Spatial latent and category-trained models have very similar -- but not identical -- internal representations, especially in their early and middle layers. We provide evidence that this convergence is partly driven by non-target latent variability in the training data, which facilitates the implicit learning of representations of those non-target latents. Taken together, these results suggest that many training objectives, such as spatial latents, can lead to similar models aligned neurally with the ventral stream. Thus, one should not assume that the ventral stream is optimized for object categorization only. As a field, we need to continue to sharpen our measures of comparing models to brains to better understand the functional roles of the ventral stream.
LGJan 31, 2022
Extreme precipitation forecasting using attention augmented convolutionsWeichen Huang
Extreme precipitation wreaks havoc throughout the world, causing billions of dollars in damage and uprooting communities, ecosystems, and economies. Accurate extreme precipitation prediction allows more time for preparation and disaster risk management for such extreme events. In this paper, we focus on short-term extreme precipitation forecasting (up to a 12-hour ahead-of-time prediction) from a sequence of sea level pressure and zonal wind anomalies. Although existing machine learning approaches have shown promising results, the associated model and climate uncertainties may reduce their reliability. To address this issue, we propose a self-attention augmented convolution mechanism for extreme precipitation forecasting, systematically combining attention scores with traditional convolutions to enrich feature data and reduce the expected errors of the results. The proposed network architecture is further fused with a highway neural network layer to gain the benefits of unimpeded information flow across several layers. Our experimental results show that the framework outperforms classical convolutional models by 12%. The proposed method increases machine learning as a tool for gaining insights into the physical causes of changing extremes, lowering uncertainty in future forecasts.