QMMar 29, 2023
Machine Learning for Uncovering Biological Insights in Spatial Transcriptomics DataAlex J. Lee, Robert Cahill, Reza Abbasi-Asl
Development and homeostasis in multicellular systems both require exquisite control over spatial molecular pattern formation and maintenance. Advances in spatially-resolved and high-throughput molecular imaging methods such as multiplexed immunofluorescence and spatial transcriptomics (ST) provide exciting new opportunities to augment our fundamental understanding of these processes in health and disease. The large and complex datasets resulting from these techniques, particularly ST, have led to rapid development of innovative machine learning (ML) tools primarily based on deep learning techniques. These ML tools are now increasingly featured in integrated experimental and computational workflows to disentangle signals from noise in complex biological systems. However, it can be difficult to understand and balance the different implicit assumptions and methodologies of a rapidly expanding toolbox of analytical tools in ST. To address this, we summarize major ST analysis goals that ML can help address and current analysis trends. We also describe four major data science concepts and related heuristics that can help guide practitioners in their choices of the right tools for the right biological questions.
CVMar 2
ORGAN: Object-Centric Representation Learning using Cycle Consistent Generative Adversarial NetworksJoël Küchler, Ellen van Maren, Vaiva Vasiliauskaitė et al.
Although data generation is often straightforward, extracting information from data is more difficult. Object-centric representation learning can extract information from images in an unsupervised manner. It does so by segmenting an image into its subcomponents: the objects. Each object is then represented in a low-dimensional latent space that can be used for downstream processing. Object-centric representation learning is dominated by autoencoder architectures (AEs). Here, we present ORGAN, a novel approach for object-centric representation learning, which is based on cycle-consistent Generative Adversarial Networks instead. We show that it performs similarly to other state-of-the-art approaches on synthetic datasets, while at the same time being the only approach tested here capable of handling more challenging real-world datasets with many objects and low visual contrast. Complementing these results, ORGAN creates expressive latent space representations that allow for object manipulation. Finally, we show that ORGAN scales well both with respect to the number of objects and the size of the images, giving it a unique edge over current state-of-the-art approaches.
IVMar 13, 2024
7T MRI Synthesization from 3T AcquisitionsQiming Cui, Duygu Tosun, Pratik Mukherjee et al.
Supervised deep learning techniques can be used to generate synthetic 7T MRIs from 3T MRI inputs. This image enhancement process leverages the advantages of ultra-high-field MRI to improve the signal-to-noise and contrast-to-noise ratios of 3T acquisitions. In this paper, we introduce multiple novel 7T synthesization algorithms based on custom-designed variants of the V-Net convolutional neural network. We demonstrate that the V-Net based model has superior performance in enhancing both single-site and multi-site MRI datasets compared to the existing benchmark model. When trained on 3T-7T MRI pairs from 8 subjects with mild Traumatic Brain Injury (TBI), our model achieves state-of-the-art 7T synthesization performance. Compared to previous works, synthetic 7T images generated from our pipeline also display superior enhancement of pathological tissue. Additionally, we implement and test a data augmentation scheme for training models that are robust to variations in the input distribution. This allows synthetic 7T models to accommodate intra-scanner and inter-scanner variability in multisite datasets. On a harmonized dataset consisting of 18 3T-7T MRI pairs from two institutions, including both healthy subjects and those with mild TBI, our model maintains its performance and can generalize to 3T MRI inputs with lower resolution. Our findings demonstrate the promise of V-Net based models for MRI enhancement and offer a preliminary probe into improving the generalizability of synthetic 7T models with data augmentation.
CLFeb 16, 2024
Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacksR. Patrick Xian, Alex J. Lee, Satvik Lolla et al. · deepmind, openai
The increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications. Understanding model vulnerabilities in high-stakes and knowledge-intensive tasks is essential for quantifying the trustworthiness of model predictions and regulating their use. The recent discovery of named entities as adversarial examples (i.e. adversarial entities) in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs in high-stakes and specialized domains. We examined the use of type-consistent entity substitution as a template for collecting adversarial entities for billion-parameter LLMs with biomedical knowledge. To this end, we developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge with a low query budget and controllable coverage. Our method has favorable query efficiency and scaling over alternative approaches based on random sampling and blackbox gradient-guided search, which we demonstrated for adversarial distractor generation in biomedical question answering. Subsequent failure mode analysis uncovered two regimes of adversarial entities on the attack surface with distinct characteristics and we showed that entity substitution attacks can manipulate token-wise Shapley value explanations, which become deceptive in this setting. Our approach complements standard evaluations for high-capacity models and the results highlight the brittleness of domain knowledge in LLMs.
CLMar 6, 2025
Measuring temporal effects of agent knowledge by date-controlled tool useR. Patrick Xian, Qiming Cui, Stefan Bauer et al. · berkeley
Temporal progression is an integral part of knowledge accumulation and update. Web search is frequently adopted as grounding for agent knowledge, yet an improper configuration affects the quality of the agent's responses. Here, we assess the agent behavior using distinct date-controlled tools (DCTs) as stress test to measure the knowledge variability of large language model (LLM) agents. We demonstrate the temporal effects of an LLM agent as a writing assistant, which uses web search to complete scientific publication abstracts. We show that the temporality of search engine translates into tool-dependent agent performance but can be alleviated with base model choice and explicit reasoning instructions such as chain-of-thought prompting. Our results indicate that agent design and evaluations should take a dynamical view and implement measures to account for the temporal influence of external resources to ensure reliability.
HCAug 13, 2025
Pre-trained Transformer-models using chronic invasive electrophysiology for symptom decoding without patient-individual trainingTimon Merk, Saeed Salehi, Richard M. Koehler et al.
Neural decoding of pathological and physiological states can enable patient-individualized closed-loop neuromodulation therapy. Recent advances in pre-trained large-scale foundation models offer the potential for generalized state estimation without patient-individual training. Here we present a foundation model trained on chronic longitudinal deep brain stimulation recordings spanning over 24 days. Adhering to long time-scale symptom fluctuations, we highlight the extended context window of 30 minutes. We present an optimized pre-training loss function for neural electrophysiological data that corrects for the frequency bias of common masked auto-encoder loss functions due to the 1-over-f power law. We show in a downstream task the decoding of Parkinson's disease symptoms with leave-one-subject-out cross-validation without patient-individual training.
CVJun 8, 2025
Enhancing the Safety of Medical Vision-Language Models by Synthetic DemonstrationsZhiyu Xue, Reza Abbasi-Asl, Ramtin Pedarsani
Generative medical vision-language models~(Med-VLMs) are primarily designed to generate complex textual information~(e.g., diagnostic reports) from multimodal inputs including vision modality~(e.g., medical images) and language modality~(e.g., clinical queries). However, their security vulnerabilities remain underexplored. Med-VLMs should be capable of rejecting harmful queries, such as \textit{Provide detailed instructions for using this CT scan for insurance fraud}. At the same time, addressing security concerns introduces the risk of over-defense, where safety-enhancing mechanisms may degrade general performance, causing Med-VLMs to reject benign clinical queries. In this paper, we propose a novel inference-time defense strategy to mitigate harmful queries, enabling defense against visual and textual jailbreak attacks. Using diverse medical imaging datasets collected from nine modalities, we demonstrate that our defense strategy based on synthetic clinical demonstrations enhances model safety without significantly compromising performance. Additionally, we find that increasing the demonstration budget alleviates the over-defense issue. We then introduce a mixed demonstration strategy as a trade-off solution for balancing security and performance under few-shot demonstration budget constraints.
QMJun 12, 2024
Opportunities in deep learning methods development for computational biologyAlex Jihun Lee, Reza Abbasi-Asl
Advances in molecular technologies underlie an enormous growth in the size of data sets pertaining to biology and biomedicine. These advances parallel those in the deep learning subfield of machine learning. Components in the differentiable programming toolbox that makes deep learning possible are allowing computer scientists to address an increasingly large array of problems with flexible and effective tools. However many of these tools have not fully proliferated into the computational biology and bioinformatics fields. In this perspective we survey some of these advances and highlight exemplary examples of their utilization in the biosciences, with the goal of increasing awareness among practitioners of emerging opportunities to blend expert knowledge with newly emerging deep learning architectural tools.
LGJun 17, 2021
Multi-Modal Prototype Learning for Interpretable Multivariable Time Series ClassificationGaurav R. Ghosal, Reza Abbasi-Asl
Multivariable time series classification problems are increasing in prevalence and complexity in a variety of domains, such as biology and finance. While deep learning methods are an effective tool for these problems, they often lack interpretability. In this work, we propose a novel modular prototype learning framework for multivariable time series classification. In the first stage of our framework, encoders extract features from each variable independently. Prototype layers identify single-variable prototypes in the resulting feature spaces. The next stage of our framework represents the multivariable time series sample points in terms of their similarity to these single-variable prototypes. This results in an inherently interpretable representation of multivariable patterns, on which prototype learning is applied to extract representative examples i.e. multivariable prototypes. Our framework is thus able to explicitly identify both informative patterns in the individual variables, as well as the relationships between the variables. We validate our framework on a simulated dataset with embedded patterns, as well as a real human activity recognition problem. Our framework attains comparable or superior classification performance to existing time series classification methods on these tasks. On the simulated dataset, we find that our model returns interpretations consistent with the embedded patterns. Moreover, the interpretations learned on the activity recognition dataset align with domain knowledge.
MLJan 14, 2019
Interpretable machine learning: definitions, methods, and applicationsW. James Murdoch, Chandan Singh, Karl Kumbier et al.
Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.
HCNov 13, 2018
Brain-Computer Interface in Virtual RealityReza Abbasi-Asl, Mohammad Keshavarzi, Dorian Yao Chan
We study the performance of brain computer interface (BCI) system in a virtual reality (VR) environment and compare it to 2D regular displays. First, we design a headset that consists of three components: a wearable electroencephalography (EEG) device, a VR headset and an interface. Recordings of brain and behavior from human subjects, performing a wide variety of tasks using our device are collected. The tasks consist of object rotation or scaling in VR using either mental commands or facial expression (smile and eyebrow movement). Subjects are asked to repeat similar tasks on regular 2D monitor screens. The performance in 3-D virtual reality environment is considerably higher compared to the to the 2D screen. Particularly, the median number of success rate across trials for VR setting is double of that for the 2D setting (8 successful command in VR setting compared to 4 successful command in 2D screen in 1 minute trials). Our results suggest that the design of future BCI systems can remarkably benefit from the VR setting.
CVNov 12, 2017
Robust registration of medical images in the presence of spatially-varying noiseReza Abbasi-Asl, Aboozar Ghaffari, Emad Fatemizadeh
Spatially-varying intensity noise is a common source of distortion in medical images. Bias field noise is one example of such a distortion that is often present in the magnetic resonance (MR) images or other modalities such as retina images. In this paper, we first show that the bias field noise can be considerably reduced using Empirical Mode Decomposition (EMD) technique. EMD is a multi-resolution tool that decomposes a signal into several principle patterns and residual components. We show that the spatially-varying noise is highly expressed in the residual component of the EMD and could be filtered out. Then, we propose two hierarchical multi-resolution EMD-based algorithms for robust registration of images in the presence of spatially varying noise. One algorithm (LR-EMD) is based on registration of EMD feature-maps from both floating and reference images in various resolution levels. In the second algorithm (AFR-EMD), we first extract an average feature-map based on EMD from both floating and reference images. Then, we use a simple hierarchical multi-resolution algorithm to register the average feature-maps. For the brain MR images, both algorithms achieve lower error rate and higher convergence percentage compared to the intensity-based hierarchical registration. Specifically, using mutual information as the similarity measure, AFR-EMD achieves 42% lower error rate in intensity and 52% lower error rate in transformation compared to intensity-based hierarchical registration. For LR-EMD, the error rate is 32% lower for the intensity and 41% lower for the transformation. Furthermore, we demonstrate that our proposed algorithms improve the registration of retina images in the presence of spatially varying noise.
MLNov 7, 2017
Interpreting Convolutional Neural Networks Through CompressionReza Abbasi-Asl, Bin Yu
Convolutional neural networks (CNNs) achieve state-of-the-art performance in a wide variety of tasks in computer vision. However, interpreting CNNs still remains a challenge. This is mainly due to the large number of parameters in these networks. Here, we investigate the role of compression and particularly pruning filters in the interpretation of CNNs. We exploit our recently-proposed greedy structural compression scheme that prunes filters in a trained CNN. In our compression, the filter importance index is defined as the classification accuracy reduction (CAR) of the network after pruning that filter. The filters are then iteratively pruned based on the CAR index. We demonstrate the interpretability of CAR-compressed CNNs by showing that our algorithm prunes filters with visually redundant pattern selectivity. Specifically, we show the importance of shape-selective filters for object recognition, as opposed to color-selective filters. Out of top 20 CAR-pruned filters in AlexNet, 17 of them in the first layer and 14 of them in the second layer are color-selective filters. Finally, we introduce a variant of our CAR importance index that quantifies the importance of each image class to each CNN filter. We show that the most and the least important class labels present a meaningful interpretation of each filter that is consistent with the visualized pattern selectivity of that filter.
CVMay 20, 2017
Structural Compression of Convolutional Neural NetworksReza Abbasi-Asl, Bin Yu
Deep convolutional neural networks (CNNs) have been successful in many tasks in machine vision, however, millions of weights in the form of thousands of convolutional filters in CNNs makes them difficult for human intepretation or understanding in science. In this article, we introduce CAR, a greedy structural compression scheme to obtain smaller and more interpretable CNNs, while achieving close to original accuracy. The compression is based on pruning filters with the least contribution to the classification accuracy. We demonstrate the interpretability of CAR-compressed CNNs by showing that our algorithm prunes filters with visually redundant functionalities such as color filters. These compressed networks are easier to interpret because they retain the filter diversity of uncompressed networks with order of magnitude less filters. Finally, a variant of CAR is introduced to quantify the importance of each image category to each CNN filter. Specifically, the most and the least important class labels are shown to be meaningful interpretations of each filter.