Andrew J. Schaumberg

1.4CVJul 28, 2021Code

Fast and Scalable Image Search For Histology

Chengkuan Chen, Ming Y. Lu, Drew F. K. Williamson et al.

The expanding adoption of digital pathology has enabled the curation of large repositories of histology whole slide images (WSIs), which contain a wealth of information. Similar pathology image search offers the opportunity to comb through large historical repositories of gigapixel WSIs to identify cases with similar morphological features and can be particularly useful for diagnosing rare diseases, identifying similar cases for predicting prognosis, treatment outcomes, and potential clinical trial success. A critical challenge in developing a WSI search and retrieval system is scalability, which is uniquely challenging given the need to search a growing number of slides that each can consist of billions of pixels and are several gigabytes in size. Such systems are typically slow and retrieval speed often scales with the size of the repository they search through, making their clinical adoption tedious and are not feasible for repositories that are constantly growing. Here we present Fast Image Search for Histopathology (FISH), a histology image search pipeline that is infinitely scalable and achieves constant search speed that is independent of the image database size while being interpretable and without requiring detailed annotations. FISH uses self-supervised deep learning to encode meaningful representations from WSIs and a Van Emde Boas tree for fast search, followed by an uncertainty-based ranking algorithm to retrieve similar WSIs. We evaluated FISH on multiple tasks and datasets with over 22,000 patient cases spanning 56 disease subtypes. We additionally demonstrate that FISH can be used to assist with the diagnosis of rare cancer types where sufficient cases may not be available to train traditional supervised deep models. FISH is available as an easy-to-use, open-source software package (https://github.com/mahmoodlab/FISH).

1.3MLJan 27, 2016

Predicting Drug Interactions and Mutagenicity with Ensemble Classifiers on Subgraphs of Molecules

Andrew Schaumberg, Angela Yu, Tatsuhiro Koshi et al.

In this study, we intend to solve a mutual information problem in interacting molecules of any type, such as proteins, nucleic acids, and small molecules. Using machine learning techniques, we accurately predict pairwise interactions, which can be of medical and biological importance. Graphs are are useful in this problem for their generality to all types of molecules, due to the inherent association of atoms through atomic bonds. Subgraphs can represent different molecular domains. These domains can be biologically significant as most molecules only have portions that are of functional significance and can interact with other domains. Thus, we use subgraphs as features in different machine learning algorithms to predict if two drugs interact and predict potential single molecule effects.

Andrew J. Schaumberg

2 Papers