Jianhua Yao

CV
h-index38
48papers
7,666citations
Novelty45%
AI Score45

48 Papers

CVJul 14, 2022Code
ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

Jiawei Yang, Hanbo Chen, Yuan Liang et al.

Detectingandsegmentingobjectswithinwholeslideimagesis essential in computational pathology workflow. Self-supervised learning (SSL) is appealing to such annotation-heavy tasks. Despite the extensive benchmarks in natural images for dense tasks, such studies are, unfortunately, absent in current works for pathology. Our paper intends to narrow this gap. We first benchmark representative SSL methods for dense prediction tasks in pathology images. Then, we propose concept contrastive learning (ConCL), an SSL framework for dense pre-training. We explore how ConCL performs with concepts provided by different sources and end up with proposing a simple dependency-free concept generating method that does not rely on external segmentation algorithms or saliency detection models. Extensive experiments demonstrate the superiority of ConCL over previous state-of-the-art SSL methods across different settings. Along our exploration, we distll several important and intriguing components contributing to the success of dense pre-training for pathology images. We hope this work could provide useful data points and encourage the community to conduct ConCL pre-training for problems of interest. Code is available.

LGSep 19, 2022Code
UMIX: Improving Importance Weighting for Subpopulation Shift via Uncertainty-Aware Mixup

Zongbo Han, Zhipeng Liang, Fan Yang et al.

Subpopulation shift widely exists in many real-world machine learning applications, referring to the training and test distributions containing the same subpopulation groups but varying in subpopulation frequencies. Importance reweighting is a normal way to handle the subpopulation shift issue by imposing constant or adaptive sampling weights on each sample in the training dataset. However, some recent studies have recognized that most of these approaches fail to improve the performance over empirical risk minimization especially when applied to over-parameterized neural networks. In this work, we propose a simple yet practical framework, called uncertainty-aware mixup (UMIX), to mitigate the overfitting issue in over-parameterized models by reweighting the ''mixed'' samples according to the sample uncertainty. The training-trajectories-based uncertainty estimation is equipped in the proposed UMIX for each sample to flexibly characterize the subpopulation distribution. We also provide insightful theoretical analysis to verify that UMIX achieves better generalization bounds over prior works. Further, we conduct extensive empirical studies across a wide range of tasks to validate the effectiveness of our method both qualitatively and quantitatively. Code is available at https://github.com/TencentAILabHealthcare/UMIX.

CVJul 5, 2022Code
ReMix: A General and Efficient Framework for Multiple Instance Learning based Whole Slide Image Classification

Jiawei Yang, Hanbo Chen, Yu Zhao et al.

Whole slide image (WSI) classification often relies on deep weakly supervised multiple instance learning (MIL) methods to handle gigapixel resolution images and slide-level labels. Yet the decent performance of deep learning comes from harnessing massive datasets and diverse samples, urging the need for efficient training pipelines for scaling to large datasets and data augmentation techniques for diversifying samples. However, current MIL-based WSI classification pipelines are memory-expensive and computation-inefficient since they usually assemble tens of thousands of patches as bags for computation. On the other hand, despite their popularity in other tasks, data augmentations are unexplored for WSI MIL frameworks. To address them, we propose ReMix, a general and efficient framework for MIL based WSI classification. It comprises two steps: reduce and mix. First, it reduces the number of instances in WSI bags by substituting instances with instance prototypes, i.e., patch cluster centroids. Then, we propose a ``Mix-the-bag'' augmentation that contains four online, stochastic and flexible latent space augmentations. It brings diverse and reliable class-identity-preserving semantic changes in the latent space while enforcing semantic-perturbation invariance. We evaluate ReMix on two public datasets with two state-of-the-art MIL methods. In our experiments, consistent improvements in precision, accuracy, and recall have been achieved but with orders of magnitude reduced training time and memory consumption, demonstrating ReMix's effectiveness and efficiency. Code is available.

LGApr 9, 2023
Reweighted Mixup for Subpopulation Shift

Zongbo Han, Zhipeng Liang, Fan Yang et al.

Subpopulation shift exists widely in many real-world applications, which refers to the training and test distributions that contain the same subpopulation groups but with different subpopulation proportions. Ignoring subpopulation shifts may lead to significant performance degradation and fairness concerns. Importance reweighting is a classical and effective way to handle the subpopulation shift. However, recent studies have recognized that most of these approaches fail to improve the performance especially when applied to over-parameterized neural networks which are capable of fitting any training samples. In this work, we propose a simple yet practical framework, called reweighted mixup (RMIX), to mitigate the overfitting issue in over-parameterized models by conducting importance weighting on the ''mixed'' samples. Benefiting from leveraging reweighting in mixup, RMIX allows the model to explore the vicinal space of minority samples more, thereby obtaining more robust model against subpopulation shift. When the subpopulation memberships are unknown, the training-trajectories-based uncertainty estimation is equipped in the proposed RMIX to flexibly characterize the subpopulation distribution. We also provide insightful theoretical analysis to verify that RMIX achieves better generalization bounds over prior works. Further, we conduct extensive empirical studies across a wide range of tasks to validate the effectiveness of the proposed method.

IVJun 15, 2022
Seeking Common Ground While Reserving Differences: Multiple Anatomy Collaborative Framework for Undersampled MRI Reconstruction

Jiangpeng Yan, Chenghui Yu, Hanbo Chen et al.

Recently, deep neural networks have greatly advanced undersampled Magnetic Resonance Image (MRI) reconstruction, wherein most studies follow the one-anatomy-one-network fashion, i.e., each expert network is trained and evaluated for a specific anatomy. Apart from inefficiency in training multiple independent models, such convention ignores the shared de-aliasing knowledge across various anatomies which can benefit each other. To explore the shared knowledge, one naive way is to combine all the data from various anatomies to train an all-round network. Unfortunately, despite the existence of the shared de-aliasing knowledge, we reveal that the exclusive knowledge across different anatomies can deteriorate specific reconstruction targets, yielding overall performance degradation. Observing this, in this study, we present a novel deep MRI reconstruction framework with both anatomy-shared and anatomy-specific parameterized learners, aiming to "seek common ground while reserving differences" across different anatomies.Particularly, the primary anatomy-shared learners are exposed to different anatomies to model flourishing shared knowledge, while the efficient anatomy-specific learners are trained with their target anatomy for exclusive knowledge. Four different implementations of anatomy-specific learners are presented and explored on the top of our framework in two MRI reconstruction networks. Comprehensive experiments on brain, knee and cardiac MRI datasets demonstrate that three of these learners are able to enhance reconstruction performance via multiple anatomy collaborative learning.

CVApr 14, 2022
3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume

Jianye Pang, Cheng Jiang, Yihao Chen et al.

Dense prediction in medical volume provides enriched guidance for clinical analysis. CNN backbones have met bottleneck due to lack of long-range dependencies and global context modeling power. Recent works proposed to combine vision transformer with CNN, due to its strong global capture ability and learning capability. However, most works are limited to simply applying pure transformer with several fatal flaws (i.e., lack of inductive bias, heavy computation and little consideration for 3D data). Therefore, designing an elegant and efficient vision transformer learner for dense prediction in medical volume is promising and challenging. In this paper, we propose a novel 3D Shuffle-Mixer network of a new Local Vision Transformer-MLP paradigm for medical dense prediction. In our network, a local vision transformer block is utilized to shuffle and learn spatial context from full-view slices of rearranged volume, a residual axial-MLP is designed to mix and capture remaining volume context in a slice-aware manner, and a MLP view aggregator is employed to project the learned full-view rich context to the volume feature in a view-aware manner. Moreover, an Adaptive Scaled Enhanced Shortcut is proposed for local vision transformer to enhance feature along spatial and channel dimensions adaptively, and a CrossMerge is proposed to skip-connects the multi-scale feature appropriately in the pyramid architecture. Extensive experiments demonstrate the proposed model outperforms other state-of-the-art medical dense prediction methods.

CVAug 6, 2022
Multiplex-detection Based Multiple Instance Learning Network for Whole Slide Image Classification

Zhikang Wang, Yue Bi, Tong Pan et al.

Multiple instance learning (MIL) is a powerful approach to classify whole slide images (WSIs) for diagnostic pathology. A fundamental challenge of MIL on WSI classification is to discover the \textit{critical instances} that trigger the bag label. However, previous methods are primarily designed under the independent and identical distribution hypothesis (\textit{i.i.d}), ignoring either the correlations between instances or heterogeneity of tumours. In this paper, we propose a novel multiplex-detection-based multiple instance learning (MDMIL) to tackle the issues above. Specifically, MDMIL is constructed by the internal query generation module (IQGM) and the multiplex detection module (MDM) and assisted by the memory-based contrastive loss during training. Firstly, IQGM gives the probability of instances and generates the internal query (IQ) for the subsequent MDM by aggregating highly reliable features after the distribution analysis. Secondly, the multiplex-detection cross-attention (MDCA) and multi-head self-attention (MHSA) in MDM cooperate to generate the final representations for the WSI. In this process, the IQ and trainable variational query (VQ) successfully build up the connections between instances and significantly improve the model's robustness toward heterogeneous tumours. At last, to further enforce constraints in the feature space and stabilize the training process, we adopt a memory-based contrastive loss, which is practicable for WSI classification even with a single sample as input in each iteration. We conduct experiments on three computational pathology datasets, e.g., CAMELYON16, TCGA-NSCLC, and TCGA-RCC datasets. The superior accuracy and AUC demonstrate the superiority of our proposed MDMIL over other state-of-the-art methods.

CVNov 26, 2022
Human-machine Interactive Tissue Prototype Learning for Label-efficient Histopathology Image Segmentation

Wentao Pan, Jiangpeng Yan, Hanbo Chen et al.

Recently, deep neural networks have greatly advanced histopathology image segmentation but usually require abundant annotated data. However, due to the gigapixel scale of whole slide images and pathologists' heavy daily workload, obtaining pixel-level labels for supervised learning in clinical practice is often infeasible. Alternatively, weakly-supervised segmentation methods have been explored with less laborious image-level labels, but their performance is unsatisfactory due to the lack of dense supervision. Inspired by the recent success of self-supervised learning methods, we present a label-efficient tissue prototype dictionary building pipeline and propose to use the obtained prototypes to guide histopathology image segmentation. Particularly, taking advantage of self-supervised contrastive learning, an encoder is trained to project the unlabeled histopathology image patches into a discriminative embedding space where these patches are clustered to identify the tissue prototypes by efficient pathologists' visual examination. Then, the encoder is used to map the images into the embedding space and generate pixel-level pseudo tissue masks by querying the tissue prototype dictionary. Finally, the pseudo masks are used to train a segmentation network with dense supervision for better performance. Experiments on two public datasets demonstrate that our human-machine interactive tissue prototype learning method can achieve comparable segmentation performance as the fully-supervised baselines with less annotation burden and outperform other weakly-supervised methods. Codes will be available upon publication.

LGNov 16, 2022
Learning with Noisy Labels over Imbalanced Subpopulations

MingCai Chen, Yu Zhao, Bing He et al.

Learning with Noisy Labels (LNL) has attracted significant attention from the research community. Many recent LNL methods rely on the assumption that clean samples tend to have "small loss". However, this assumption always fails to generalize to some real-world cases with imbalanced subpopulations, i.e., training subpopulations varying in sample size or recognition difficulty. Therefore, recent LNL methods face the risk of misclassifying those "informative" samples (e.g., hard samples or samples in the tail subpopulations) into noisy samples, leading to poor generalization performance. To address the above issue, we propose a novel LNL method to simultaneously deal with noisy labels and imbalanced subpopulations. It first leverages sample correlation to estimate samples' clean probabilities for label correction and then utilizes corrected labels for Distributionally Robust Optimization (DRO) to further improve the robustness. Specifically, in contrast to previous works using classification loss as the selection criterion, we introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities. Then, we refurbish the noisy labels using the estimated clean probabilities and the pseudo-labels from the model's predictions. With refurbished labels, we use DRO to train the model to be robust to subpopulation imbalance. Extensive experiments on a wide range of benchmarks demonstrate that our technique can consistently improve current state-of-the-art robust learning paradigms against noisy labels, especially when encountering imbalanced subpopulations.

LGAug 28, 2024
Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis

Sijie Mai, Yu Zhao, Ying Zeng et al.

Multimodal sentiment analysis aims to effectively integrate information from various sources to infer sentiment, where in many cases there are no annotations for unimodal labels. Therefore, most works rely on multimodal labels for training. However, there exists the noisy label problem for the learning of unimodal signals as multimodal annotations are not always the ideal substitutes for the unimodal ones, failing to achieve finer optimization for individual modalities. In this paper, we explore the learning of unimodal labels under the weak supervision from the annotated multimodal labels. Specifically, we propose a novel meta uni-label generation (MUG) framework to address the above problem, which leverages the available multimodal labels to learn the corresponding unimodal labels by the meta uni-label correction network (MUCN). We first design a contrastive-based projection module to bridge the gap between unimodal and multimodal representations, so as to use multimodal annotations to guide the learning of MUCN. Afterwards, we propose unimodal and multimodal denoising tasks to train MUCN with explicit supervision via a bi-level optimization strategy. We then jointly train unimodal and multimodal learning tasks to extract discriminative unimodal features for multimodal inference. Experimental results suggest that MUG outperforms competitive baselines and can learn accurate unimodal labels.

LGJul 29, 2023
A Noisy-Label-Learning Formulation for Immune Repertoire Classification and Disease-Associated Immune Receptor Sequence Identification

Mingcai Chen, Yu Zhao, Zhonghuang Wang et al.

Immune repertoire classification, a typical multiple instance learning (MIL) problem, is a frontier research topic in computational biology that makes transformative contributions to new vaccines and immune therapies. However, the traditional instance-space MIL, directly assigning bag-level labels to instances, suffers from the massive amount of noisy labels and extremely low witness rate. In this work, we propose a noisy-label-learning formulation to solve the immune repertoire classification task. To remedy the inaccurate supervision of repertoire-level labels for a sequence-level classifier, we design a robust training strategy: The initial labels are smoothed to be asymmetric and are progressively corrected using the model's predictions throughout the training process. Furthermore, two models with the same architecture but different parameter initialization are co-trained simultaneously to remedy the known "confirmation bias" problem in the self-training-like schema. As a result, we obtain accurate sequence-level classification and, subsequently, repertoire-level classification. Experiments on the Cytomegalovirus (CMV) and Cancer datasets demonstrate our method's effectiveness and superior performance on sequence-level and repertoire-level tasks.

LGSep 15, 2024
Dynamic Fraud Detection: Integrating Reinforcement Learning into Graph Neural Networks

Yuxin Dong, Jianhua Yao, Jiajing Wang et al.

Financial fraud refers to the act of obtaining financial benefits through dishonest means. Such behavior not only disrupts the order of the financial market but also harms economic and social development and breeds other illegal and criminal activities. With the popularization of the internet and online payment methods, many fraudulent activities and money laundering behaviors in life have shifted from offline to online, posing a great challenge to regulatory authorities. How to efficiently detect these financial fraud activities has become an urgent issue that needs to be resolved. Graph neural networks are a type of deep learning model that can utilize the interactive relationships within graph structures, and they have been widely applied in the field of fraud detection. However, there are still some issues. First, fraudulent activities only account for a very small part of transaction transfers, leading to an inevitable problem of label imbalance in fraud detection. At the same time, fraudsters often disguise their behavior, which can have a negative impact on the final prediction results. In addition, existing research has overlooked the importance of balancing neighbor information and central node information. For example, when the central node has too many neighbors, the features of the central node itself are often neglected. Finally, fraud activities and patterns are constantly changing over time, so considering the dynamic evolution of graph edge relationships is also very important.

RMSep 23, 2024
Unveiling the Potential of Graph Neural Networks in SME Credit Risk Assessment

Bingyao Liu, Iris Li, Jianhua Yao et al.

This paper takes the graph neural network as the technical framework, integrates the intrinsic connections between enterprise financial indicators, and proposes a model for enterprise credit risk assessment. The main research work includes: Firstly, based on the experience of predecessors, we selected 29 enterprise financial data indicators, abstracted each indicator as a vertex, deeply analyzed the relationships between the indicators, constructed a similarity matrix of indicators, and used the maximum spanning tree algorithm to achieve the graph structure mapping of enterprises; secondly, in the representation learning phase of the mapped graph, a graph neural network model was built to obtain its embedded representation. The feature vector of each node was expanded to 32 dimensions, and three GraphSAGE operations were performed on the graph, with the results pooled using the Pool operation, and the final output of three feature vectors was averaged to obtain the graph's embedded representation; finally, a classifier was constructed using a two-layer fully connected network to complete the prediction task. Experimental results on real enterprise data show that the model proposed in this paper can well complete the multi-level credit level estimation of enterprises. Furthermore, the tree-structured graph mapping deeply portrays the intrinsic connections of various indicator data of the company, and according to the ROC and other evaluation criteria, the model's classification effect is significant and has good "robustness".

LGNov 26, 2025
Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning

Zhenchao Tang, Fang Wang, Haohuai He et al.

Effective post-training is essential to align Large Language Models (LLMs) with specialized biomedical knowledge to accelerate life science research. However, current approaches face significant limitations. First, biomedical reasoning involves intricate mechanisms often represented by sparse textual data. Standard Supervised Fine-Tuning (SFT) tends to overfit to surface-level instruction patterns without effectively internalizing this fragmented scientific knowledge. Second, Reinforcement Learning (RL) is impractical for this domain, as defining meaningful rewards often necessitates prohibitive experimental validation (e.g., wet-lab verification of drug responses), rendering real-time feedback unfeasible. We propose Balanced Fine-Tuning (BFT), an efficient post-training method designed to learn complex reasoning from sparse data without external reward signals. BFT operates through a two-layer weighting mechanism: 1. At the token level, it scales loss via prediction probabilities to stabilize gradients and prevent overfitting; 2. At the sample level, it uses "minimum group confidence" to adaptively enhance the learning of hard samples. Experiments demonstrate that BFT significantly outperforms SFT. In medical tasks, it enables LLMs to acquire knowledge that SFT misses. In biological tasks, BFT-based LLMs surpass GeneAgent (an accurate agent for biology analysis) in biological process reasoning. Moreover, the text embeddings generated by BFT can be directly applied to downstream tasks, such as gene interaction and single-cell perturbation response prediction. These results indicate that BFT facilitates broad applications of LLMs in biomedical research.

QMApr 23, 2024Code
Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

Yikun Zhang, Geyan Ye, Chaohao Yuan et al.

Molecule-and-text cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation, thereby improving performance in various scientific fields. However, most approaches employ a global alignment approach to learn the knowledge from different modalities that may fail to capture fine-grained information, such as molecule-and-text fragments and stereoisomeric nuances, which is crucial for downstream tasks. Furthermore, it is incapable of modeling such information using a similar global alignment strategy due to the lack of annotations about the fine-grained fragments in the existing dataset. In this paper, we propose Atomas, a hierarchical molecular representation learning framework that jointly learns representations from SMILES strings and text. We design a Hierarchical Adaptive Alignment model to automatically learn the fine-grained fragment correspondence between two modalities and align these representations at three semantic levels. Atomas's end-to-end training framework supports understanding and generating molecules, enabling a wider range of downstream tasks. Atomas achieves superior performance across 12 tasks on 11 datasets, outperforming 11 baseline models thus highlighting the effectiveness and versatility of our method. Scaling experiments further demonstrate Atomas's robustness and scalability. Moreover, visualization and qualitative analysis, validated by human experts, confirm the chemical relevance of our approach. Codes are released on https://github.com/yikunpku/Atomas.

IVFeb 18, 2022Code
Towards better understanding and better generalization of few-shot classification in histology images with contrastive learning

Jiawei Yang, Hanbo Chen, Jiangpeng Yan et al.

Few-shot learning is an established topic in natural images for years, but few work is attended to histology images, which is of high clinical value since well-labeled datasets and rare abnormal samples are expensive to collect. Here, we facilitate the study of few-shot learning in histology images by setting up three cross-domain tasks that simulate real clinics problems. To enable label-efficient learning and better generalizability, we propose to incorporate contrastive learning (CL) with latent augmentation (LA) to build a few-shot system. CL learns useful representations without manual labels, while LA transfers semantic variations of the base dataset in an unsupervised way. These two components fully exploit unlabeled training data and can scale gracefully to other label-hungry problems. In experiments, we find i) models learned by CL generalize better than supervised learning for histology images in unseen classes, and ii) LA brings consistent gains over baselines. Prior studies of self-supervised learning mainly focus on ImageNet-like images, which only present a dominant object in their centers. Recent attention has been paid to images with multi-objects and multi-textures. Histology images are a natural choice for such a study. We show the superiority of CL over supervised learning in terms of generalization for such data and provide our empirical understanding for this observation. The findings in this work could contribute to understanding how the model generalizes in the context of both representation learning and histological image analysis. Code is available.

CLFeb 7, 2024
StableMask: Refining Causal Masking in Decoder-only Transformer

Qingyu Yin, Xuzheng He, Xiang Zhuang et al.

The decoder-only Transformer architecture with causal masking and relative position encoding (RPE) has become the de facto choice in language modeling. Despite its exceptional performance across various tasks, we have identified two limitations: First, it requires all attention scores to be non-zero and sum up to 1, even if the current embedding has sufficient self-contained information. This compels the model to assign disproportional excessive attention to specific tokens. Second, RPE-based Transformers are not universal approximators due to their limited capacity at encoding absolute positional information, which limits their application in position-critical tasks. In this work, we propose StableMask: a parameter-free method to address both limitations by refining the causal mask. It introduces pseudo-attention values to balance attention distributions and encodes absolute positional information via a progressively decreasing mask ratio. StableMask's effectiveness is validated both theoretically and empirically, showing significant enhancements in language models with parameter sizes ranging from 71M to 1.4B across diverse datasets and encoding methods. We further show that it naturally supports (1) efficient extrapolation without special tricks such as StreamingLLM and (2) easy integration with existing attention optimization techniques.

LGDec 9, 2024
Stock Type Prediction Model Based on Hierarchical Graph Neural Network

Jianhua Yao, Yuxin Dong, Jiajing Wang et al.

This paper introduces a novel approach to stock data analysis by employing a Hierarchical Graph Neural Network (HGNN) model that captures multi-level information and relational structures in the stock market. The HGNN model integrates stock relationship data and hierarchical attributes to predict stock types effectively. The paper discusses the construction of a stock industry relationship graph and the extraction of temporal information from historical price sequences. It also highlights the design of a graph convolution operation and a temporal attention aggregator to model the macro market state. The integration of these features results in a comprehensive stock prediction model that addresses the challenges of utilizing stock relationship data and modeling hierarchical attributes in the stock market.

LGApr 21, 2025
Application of Deep Generative Models for Anomaly Detection in Complex Financial Transactions

Tengda Tang, Jianhua Yao, Yixian Wang et al.

This study proposes an algorithm for detecting suspicious behaviors in large payment flows based on deep generative models. By combining Generative Adversarial Networks (GAN) and Variational Autoencoders (VAE), the algorithm is designed to detect abnormal behaviors in financial transactions. First, the GAN is used to generate simulated data that approximates normal payment flows. The discriminator identifies anomalous patterns in transactions, enabling the detection of potential fraud and money laundering behaviors. Second, a VAE is introduced to model the latent distribution of payment flows, ensuring that the generated data more closely resembles real transaction features, thus improving the model's detection accuracy. The method optimizes the generative capabilities of both GAN and VAE, ensuring that the model can effectively capture suspicious behaviors even in sparse data conditions. Experimental results show that the proposed method significantly outperforms traditional machine learning algorithms and other deep learning models across various evaluation metrics, especially in detecting rare fraudulent behaviors. Furthermore, this study provides a detailed comparison of performance in recognizing different transaction patterns (such as normal, money laundering, and fraud) in large payment flows, validating the advantages of generative models in handling complex financial data.

QMApr 18, 2024
Annotation-guided Protein Design with Multi-Level Domain Alignment

Chaohao Yuan, Songyou Li, Geyan Ye et al.

The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which directly describe the protein's high-level functionalities, properties, and their correlation with target amino acid sequences, remain unexplored in the context of protein design tasks. In this paper, we propose Protein-Annotation Alignment Generation, PAAG, a multi-modality protein design framework that integrates the textual annotations extracted from protein database for controllable generation in sequence space. Specifically, within a multi-level alignment module, PAAG can explicitly generate proteins containing specific domains conditioned on the corresponding domain annotations, and can even design novel proteins with flexible combinations of different kinds of annotations. Our experimental results underscore the superiority of the aligned protein representations from PAAG over 7 prediction tasks. Furthermore, PAAG demonstrates a significant increase in generation success rate (24.7% vs 4.7% in zinc finger, and 54.3% vs 22.0% in the immunoglobulin domain) in comparison to the existing model. We anticipate that PAAG will broaden the horizons of protein design by leveraging the knowledge from between textual annotation and proteins.

CVDec 14, 2023
Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained Locality Learning Matters in Consistency Regularization

Wentao Pan, Zhe Xu, Jiangpeng Yan et al. · tencent-ai

Semi-supervised semantic segmentation aims to utilize limited labeled images and abundant unlabeled images to achieve label-efficient learning, wherein the weak-to-strong consistency regularization framework, popularized by FixMatch, is widely used as a benchmark scheme. Despite its effectiveness, we observe that such scheme struggles with satisfactory segmentation for the local regions. This can be because it originally stems from the image classification task and lacks specialized mechanisms to capture fine-grained local semantics that prioritizes in dense prediction. To address this issue, we propose a novel framework called \texttt{MaskMatch}, which enables fine-grained locality learning to achieve better dense segmentation. On top of the original teacher-student framework, we design a masked modeling proxy task that encourages the student model to predict the segmentation given the unmasked image patches (even with 30\% only) and enforces the predictions to be consistent with pseudo-labels generated by the teacher model using the complete image. Such design is motivated by the intuition that if the predictions are more consistent given insufficient neighboring information, stronger fine-grained locality perception is achieved. Besides, recognizing the importance of reliable pseudo-labels in the above locality learning and the original consistency learning scheme, we design a multi-scale ensembling strategy that considers context at different levels of abstraction for pseudo-label generation. Extensive experiments on benchmark datasets demonstrate the superiority of our method against previous approaches and its plug-and-play flexibility.

OTFeb 21, 2025
Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence

Yingying Sun, Jun A, Zhiwei Liu et al.

Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights. These include developing an AI-friendly ecosystem for proteomics data generation, sharing, and analysis; improving peptide and protein identification and quantification; characterizing protein-protein interactions and protein complexes; advancing spatial and perturbation proteomics; integrating multi-omics data; and ultimately enabling AI-empowered virtual cells.

BMJul 1, 2025
Steering Protein Language Models

Long-Kai Huang, Rongyi Zhu, Bing He et al.

Protein Language Models (PLMs), pre-trained on extensive evolutionary data from natural proteins, have emerged as indispensable tools for protein design. While powerful, PLMs often struggle to produce proteins with precisely specified functionalities or properties due to inherent challenges in controlling their outputs. In this work, we investigate the potential of Activation Steering, a technique originally developed for controlling text generation in Large Language Models (LLMs), to direct PLMs toward generating protein sequences with targeted properties. We propose a simple yet effective method that employs activation editing to steer PLM outputs, and extend this approach to protein optimization through a novel editing site identification module. Through comprehensive experiments on lysozyme-like sequence generation and optimization, we demonstrate that our methods can be seamlessly integrated into both auto-encoding and autoregressive PLMs without requiring additional training. These results highlight a promising direction for precise protein engineering using foundation models.

LGJun 24, 2024
Relaxing Continuous Constraints of Equivariant Graph Neural Networks for Physical Dynamics Learning

Zinan Zheng, Yang Liu, Jia Li et al.

Incorporating Euclidean symmetries (e.g. rotation equivariance) as inductive biases into graph neural networks has improved their generalization ability and data efficiency in unbounded physical dynamics modeling. However, in various scientific and engineering applications, the symmetries of dynamics are frequently discrete due to the boundary conditions. Thus, existing GNNs either overlook necessary symmetry, resulting in suboptimal representation ability, or impose excessive equivariance, which fails to generalize to unobserved symmetric dynamics. In this work, we propose a general Discrete Equivariant Graph Neural Network (DEGNN) that guarantees equivariance to a given discrete point group. Specifically, we show that such discrete equivariant message passing could be constructed by transforming geometric features into permutation-invariant embeddings. Through relaxing continuous equivariant constraints, DEGNN can employ more geometric feature combinations to approximate unobserved physical object interaction functions. Two implementation approaches of DEGNN are proposed based on ranking or pooling permutation-invariant functions. We apply DEGNN to various physical dynamics, ranging from particle, molecular, crowd to vehicle dynamics. In twenty scenarios, DEGNN significantly outperforms existing state-of-the-art approaches. Moreover, we show that DEGNN is data efficient, learning with less data, and can generalize across scenarios such as unobserved orientation.

GNJul 11, 2023
DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

Daoan Zhang, Weitong Zhang, Yu Zhao et al.

Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge. To address this, we propose DNAGPT, a generalized DNA pre-training model trained on over 200 billion base pairs from all mammals. By enhancing the classic GPT model with a binary classification task (DNA sequence order), a numerical regression task (guanine-cytosine content prediction), and a comprehensive token language, DNAGPT can handle versatile DNA analysis tasks while processing both sequence and numerical data. Our evaluation of genomic signal and region recognition, mRNA abundance regression, and artificial genomes generation tasks demonstrates DNAGPT's superior performance compared to existing models designed for specific downstream tasks, benefiting from pre-training using the newly designed model structure.

CVDec 22, 2021
Fusion of medical imaging and electronic health records with attention and multi-head machanisms

Cheng Jiang, Yihao Chen, Jianbo Chang et al.

Doctors often make diagonostic decisions based on patient's image scans, such as magnetic resonance imaging (MRI), and patient's electronic health records (EHR) such as age, gender, blood pressure and so on. Despite a lot of automatic methods have been proposed for either image or text analysis in computer vision or natural language research areas, much fewer studies have been developed for the fusion of medical image and EHR data for medical problems. Among existing early or intermediate fusion methods, concatenation of features from both modalities is still a mainstream. For a better exploiting of image and EHR data, we propose a multi-modal attention module which use EHR data to help the selection of important regions during image feature extraction process conducted by traditional CNN. Moreover, we propose to incorporate multi-head machnism to gated multimodal unit (GMU) to make it able to parallelly fuse image and EHR features in different subspaces. With the help of the two modules, existing CNN architecture can be enhanced using both modalities. Experiments on predicting Glasgow outcome scale (GOS) of intracerebral hemorrhage patients and classifying Alzheimer's Disease showed the proposed method can automatically focus on task-related areas and achieve better results by making better use of image and EHR features.

IVNov 24, 2020
Blind deblurring for microscopic pathology images using deep learning networks

Cheng Jiang, Jun Liao, Pei Dong et al.

Artificial Intelligence (AI)-powered pathology is a revolutionary step in the world of digital pathology and shows great promise to increase both diagnosis accuracy and efficiency. However, defocus and motion blur can obscure tissue or cell characteristics hence compromising AI algorithms'accuracy and robustness in analyzing the images. In this paper, we demonstrate a deep-learning-based approach that can alleviate the defocus and motion blur of a microscopic image and output a sharper and cleaner image with retrieved fine details without prior knowledge of the blur type, blur extent and pathological stain. In this approach, a deep learning classifier is first trained to identify the image blur type. Then, two encoder-decoder networks are trained and used alone or in combination to deblur the input image. It is an end-to-end approach and introduces no corrugated artifacts as traditional blind deconvolution methods do. We test our approach on different types of pathology specimens and demonstrate great performance on image blur correction and the subsequent improvement on the diagnosis outcome of AI algorithms.

IVSep 15, 2020
Microscope Based HER2 Scoring System

Jun Zhang, Kuan Tian, Pei Dong et al.

The overexpression of human epidermal growth factor receptor 2 (HER2) has been established as a therapeutic target in multiple types of cancers, such as breast and gastric cancers. Immunohistochemistry (IHC) is employed as a basic HER2 test to identify the HER2-positive, borderline, and HER2-negative patients. However, the reliability and accuracy of HER2 scoring are affected by many factors, such as pathologists' experience. Recently, artificial intelligence (AI) has been used in various disease diagnosis to improve diagnostic accuracy and reliability, but the interpretation of diagnosis results is still an open problem. In this paper, we propose a real-time HER2 scoring system, which follows the HER2 scoring guidelines to complete the diagnosis, and thus each step is explainable. Unlike the previous scoring systems based on whole-slide imaging, our HER2 scoring system is integrated into an augmented reality (AR) microscope that can feedback AI results to the pathologists while reading the slide. The pathologists can help select informative fields of view (FOVs), avoiding the confounding regions, such as DCIS. Importantly, we illustrate the intermediate results with membrane staining condition and cell classification results, making it possible to evaluate the reliability of the diagnostic results. Also, we support the interactive modification of selecting regions-of-interest, making our system more flexible in clinical practice. The collaboration of AI and pathologists can significantly improve the robustness of our system. We evaluate our system with 285 breast IHC HER2 slides, and the classification accuracy of 95\% shows the effectiveness of our HER2 scoring system.

IVApr 30, 2020
COVID-DA: Deep Domain Adaptation from Typical Pneumonia to COVID-19

Yifan Zhang, Shuaicheng Niu, Zhen Qiu et al.

The outbreak of novel coronavirus disease 2019 (COVID-19) has already infected millions of people and is still rapidly spreading all over the globe. Most COVID-19 patients suffer from lung infection, so one important diagnostic method is to screen chest radiography images, e.g., X-Ray or CT images. However, such examinations are time-consuming and labor-intensive, leading to limited diagnostic efficiency. To solve this issue, AI-based technologies, such as deep learning, have been used recently as effective computer-aided means to improve diagnostic efficiency. However, one practical and critical difficulty is the limited availability of annotated COVID-19 data, due to the prohibitive annotation costs and urgent work of doctors to fight against the pandemic. This makes the learning of deep diagnosis models very challenging. To address this, motivated by that typical pneumonia has similar characteristics with COVID-19 and many pneumonia datasets are publicly available, we propose to conduct domain knowledge adaptation from typical pneumonia to COVID-19. There are two main challenges: 1) the discrepancy of data distributions between domains; 2) the task difference between the diagnosis of typical pneumonia and COVID-19. To address them, we propose a new deep domain adaptation method for COVID-19 diagnosis, namely COVID-DA. Specifically, we alleviate the domain discrepancy via feature adversarial adaptation and handle the task difference issue via a novel classifier separation scheme. In this way, COVID-DA is able to diagnose COVID-19 effectively with only a small number of COVID-19 annotations. Extensive experiments verify the effectiveness of COVID-DA and its great potential for real-world applications.

CVFeb 23, 2019
Spatio-Temporal Convolutional LSTMs for Tumor Growth Prediction by Learning 4D Longitudinal Patient Data

Ling Zhang, Le Lu, Xiaosong Wang et al.

Prognostic tumor growth modeling via volumetric medical imaging observations can potentially lead to better outcomes of tumor treatment and surgical planning. Recent advances of convolutional networks have demonstrated higher accuracy than traditional mathematical models in predicting future tumor volumes. This indicates that deep learning-based techniques may have great potentials on addressing such problem. However, current 2D patch-based modeling approaches cannot make full use of the spatio-temporal imaging context of the tumor's longitudinal 4D (3D + time) data. Moreover, they are incapable to predict clinically-relevant tumor properties, other than volumes. In this paper, we exploit to formulate the tumor growth process through convolutional Long Short-Term Memory (ConvLSTM) that extract tumor's static imaging appearances and capture its temporal dynamic changes within a single network. We extend ConvLSTM into the spatio-temporal domain (ST-ConvLSTM) by jointly learning the inter-slice 3D contexts and the longitudinal or temporal dynamics from multiple patient studies. Our approach can incorporate other non-imaging patient information in an end-to-end trainable manner. Experiments are conducted on the largest 4D longitudinal tumor dataset of 33 patients to date. Results validate that the ST-ConvLSTM produces a Dice score of 83.2%+-5.1% and a RVD of 11.2%+-10.8%, both significantly outperforming (p<0.05) other compared methods of linear model, ConvLSTM, and generative adversarial network (GAN) under the metric of predicting future tumor volumes. Additionally, our new method enables the prediction of both cell density and CT intensity numbers. Last, we demonstrate the generalizability of ST-ConvLSTM by employing it in 4D medical image segmentation task, which achieves an averaged Dice score of 86.3+-1.2% for left-ventricle segmentation in 4D ultrasound with 3 seconds per patient.

CVOct 14, 2018
Fine-Grained Classification of Cervical Cells Using Morphological and Appearance Based Convolutional Neural Networks

Haoming Lin, Yuyang Hu, Siping Chen et al.

Fine-grained classification of cervical cells into different abnormality levels is of great clinical importance but remains very challenging. Contrary to traditional classification methods that rely on hand-crafted or engineered features, convolution neural network (CNN) can classify cervical cells based on automatically learned deep features. However, CNN in previous studies do not involve cell morphological information, and it is unknown whether morphological features can be directly modeled by CNN to classify cervical cells. This paper presents a CNN-based method that combines cell image appearance with cell morphology for classification of cervical cells in Pap smear. The training cervical cell dataset consists of adaptively re-sampled image patches coarsely centered on the nuclei. Several CNN models (AlexNet, GoogleNet, ResNet and DenseNet) pre-trained on ImageNet dataset were fine-tuned on the cervical dataset for comparison. The proposed method is evaluated on the Herlev cervical dataset by five-fold cross-validation at patient level splitting. Results show that by adding cytoplasm and nucleus masks as raw morphological information into appearance-based CNN learning, higher classification accuracies can be achieved in general. Among the four CNN models, GoogleNet fed with both morphological and appearance information obtains the highest classification accuracies of 94.5% for 2-class classification task and 64.5% for 7-class classification task. Our method demonstrates that combining cervical cell morphology with appearance information can provide improved classification performance, which is clinically important for early diagnosis of cervical dysplastic changes.

CVJan 25, 2018
DeepPap: Deep Convolutional Networks for Cervical Cell Classification

Ling Zhang, Le Lu, Isabella Nogues et al.

Automation-assisted cervical screening via Pap smear or liquid-based cytology (LBC) is a highly effective cell imaging based cancer detection tool, where cells are partitioned into "abnormal" and "normal" categories. However, the success of most traditional classification methods relies on the presence of accurate cell segmentations. Despite sixty years of research in this field, accurate segmentation remains a challenge in the presence of cell clusters and pathologies. Moreover, previous classification methods are only built upon the extraction of hand-crafted features, such as morphology and texture. This paper addresses these limitations by proposing a method to directly classify cervical cells - without prior segmentation - based on deep features, using convolutional neural networks (ConvNets). First, the ConvNet is pre-trained on a natural image dataset. It is subsequently fine-tuned on a cervical cell dataset consisting of adaptively re-sampled image patches coarsely centered on the nuclei. In the testing phase, aggregation is used to average the prediction scores of a similar set of image patches. The proposed method is evaluated on both Pap smear and LBC datasets. Results show that our method outperforms previous algorithms in classification accuracy (98.3%), area under the curve (AUC) (0.99) values, and especially specificity (98.3%), when applied to the Herlev benchmark Pap smear dataset and evaluated using five-fold cross-validation. Similar superior performances are also achieved on the HEMLBC (H&E stained manual LBC) dataset. Our method is promising for the development of automation-assisted reading systems in primary cervical screening.

CVJan 25, 2018
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans

Zhihui Guo, Ling Zhang, Le Lu et al.

This paper reports Deep LOGISMOS approach to 3D tumor segmentation by incorporating boundary information derived from deep contextual learning to LOGISMOS - layered optimal graph image segmentation of multiple objects and surfaces. Accurate and reliable tumor segmentation is essential to tumor growth analysis and treatment selection. A fully convolutional network (FCN), UNet, is first trained using three adjacent 2D patches centered at the tumor, providing contextual UNet segmentation and probability map for each 2D patch. The UNet segmentation is then refined by Gaussian Mixture Model (GMM) and morphological operations. The refined UNet segmentation is used to provide the initial shape boundary to build a segmentation graph. The cost for each node of the graph is determined by the UNet probability maps. Finally, a max-flow algorithm is employed to find the globally optimal solution thus obtaining the final segmentation. For evaluation, we applied the method to pancreatic tumor segmentation on a dataset of 51 CT scans, among which 30 scans were used for training and 21 for testing. With Deep LOGISMOS, DICE Similarity Coefficient (DSC) and Relative Volume Difference (RVD) reached 83.2+-7.8% and 18.6+-17.4% respectively, both are significantly improved (p<0.05) compared with contextual UNet and/or LOGISMOS alone.

CVJan 25, 2018
Self-Learning to Detect and Segment Cysts in Lung CT Images without Manual Annotation

Ling Zhang, Vissagan Gopalakrishnan, Le Lu et al.

Image segmentation is a fundamental problem in medical image analysis. In recent years, deep neural networks achieve impressive performances on many medical image segmentation tasks by supervised learning on large manually annotated data. However, expert annotations on big medical datasets are tedious, expensive or sometimes unavailable. Weakly supervised learning could reduce the effort for annotation but still required certain amounts of expertise. Recently, deep learning shows a potential to produce more accurate predictions than the original erroneous labels. Inspired by this, we introduce a very weakly supervised learning method, for cystic lesion detection and segmentation in lung CT images, without any manual annotation. Our method works in a self-learning manner, where segmentation generated in previous steps (first by unsupervised segmentation then by neural networks) is used as ground truth for the next level of network learning. Experiments on a cystic lung lesion dataset show that the deep learning could perform better than the initial unsupervised annotation, and progressively improve itself after self-learning.

CVJan 25, 2018
Convolutional Invasion and Expansion Networks for Tumor Growth Prediction

Ling Zhang, Le Lu, Ronald M. Summers et al.

Tumor growth is associated with cell invasion and mass-effect, which are traditionally formulated by mathematical models, namely reaction-diffusion equations and biomechanics. Such models can be personalized based on clinical measurements to build the predictive models for tumor growth. In this paper, we investigate the possibility of using deep convolutional neural networks (ConvNets) to directly represent and learn the cell invasion and mass-effect, and to predict the subsequent involvement regions of a tumor. The invasion network learns the cell invasion from information related to metabolic rate, cell density and tumor boundary derived from multimodal imaging data. The expansion network models the mass-effect from the growing motion of tumor mass. We also study different architectures that fuse the invasion and expansion networks, in order to exploit the inherent correlations among them. Our network can easily be trained on population data and personalized to a target patient, unlike most previous mathematical modeling methods that fail to incorporate population data. Quantitative experiments on a pancreatic tumor data set show that the proposed method substantially outperforms a state-of-the-art mathematical model-based approach in both accuracy and efficiency, and that the information captured by each of the two subnetworks are complementary.

CVJun 1, 2017
Personalized Pancreatic Tumor Growth Prediction via Group Learning

Ling Zhang, Le Lu, Ronald M. Summers et al.

Tumor growth prediction, a highly challenging task, has long been viewed as a mathematical modeling problem, where the tumor growth pattern is personalized based on imaging and clinical data of a target patient. Though mathematical models yield promising results, their prediction accuracy may be limited by the absence of population trend data and personalized clinical characteristics. In this paper, we propose a statistical group learning approach to predict the tumor growth pattern that incorporates both the population trend and personalized data, in order to discover high-level features from multimodal imaging data. A deep convolutional neural network approach is developed to model the voxel-wise spatio-temporal tumor progression. The deep features are combined with the time intervals and the clinical factors to feed a process of feature selection. Our predictive model is pretrained on a group data set and personalized on the target patient data to estimate the future spatio-temporal progression of the patient's tumor. Multimodal imaging data at multiple time points are used in the learning, personalization and inference stages. Our method achieves a Dice coefficient of 86.8% +- 3.6% and RVD of 7.9% +- 5.4% on a pancreatic tumor data set, outperforming the DSC of 84.4% +- 4.0% and RVD 13.9% +- 9.8% obtained by a previous state-of-the-art model-based method.

CVJan 23, 2017
Unsupervised Joint Mining of Deep Features and Image Labels for Large-scale Radiology Image Categorization and Scene Recognition

Xiaosong Wang, Le Lu, Hoo-chang Shin et al.

The recent rapid and tremendous success of deep convolutional neural networks (CNN) on many challenging computer vision tasks largely derives from the accessibility of the well-annotated ImageNet and PASCAL VOC datasets. Nevertheless, unsupervised image categorization (i.e., without the ground-truth labeling) is much less investigated, yet critically important and difficult when annotations are extremely hard to obtain in the conventional way of "Google Search" and crowd sourcing. We address this problem by presenting a looped deep pseudo-task optimization (LDPO) framework for joint mining of deep CNN features and image labels. Our method is conceptually simple and rests upon the hypothesized "convergence" of better labels leading to better trained CNN models which in turn feed more discriminative image representations to facilitate more meaningful clusters/labels. Our proposed method is validated in tackling two important applications: 1) Large-scale medical image annotation has always been a prohibitively expensive and easily-biased task even for well-trained radiologists. Significantly better image categorization results are achieved via our proposed approach compared to the previous state-of-the-art method. 2) Unsupervised scene recognition on representative and publicly available datasets with our proposed technique is examined. The LDPO achieves excellent quantitative scene classification results. On the MIT indoor scene dataset, it attains a clustering accuracy of 75.3%, compared to the state-of-the-art supervised classification accuracy of 81.0% (when both are based on the VGG-VD model).

CVMar 28, 2016
Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation

Hoo-Chang Shin, Kirk Roberts, Le Lu et al.

Despite the recent advances in automatically describing image contents, their applications have been mostly limited to image caption datasets containing natural images (e.g., Flickr 30k, MSCOCO). In this paper, we present a deep learning model to efficiently detect a disease from an image and annotate its contexts (e.g., location, severity and the affected organs). We employ a publicly available radiology dataset of chest x-rays and their reports, and use its image annotations to mine disease names to train convolutional neural networks (CNNs). In doing so, we adopt various regularization techniques to circumvent the large normal-vs-diseased cases bias. Recurrent neural networks (RNNs) are then trained to describe the contexts of a detected disease, based on the deep CNN features. Moreover, we introduce a novel approach to use the weights of the already trained pair of CNN/RNN on the domain-specific image/text dataset, to infer the joint image/text contexts for composite image labeling. Significantly improved image annotation results are demonstrated using the recurrent neural cascade model by taking the joint image/text contexts into account.

CVMar 25, 2016
Unsupervised Category Discovery via Looped Deep Pseudo-Task Optimization Using a Large Scale Radiology Image Database

Xiaosong Wang, Le Lu, Hoo-chang Shin et al.

Obtaining semantic labels on a large scale radiology image database (215,786 key images from 61,845 unique patients) is a prerequisite yet bottleneck to train highly effective deep convolutional neural network (CNN) models for image recognition. Nevertheless, conventional methods for collecting image labels (e.g., Google search followed by crowd-sourcing) are not applicable due to the formidable difficulties of medical annotation tasks for those who are not clinically trained. This type of image labeling task remains non-trivial even for radiologists due to uncertainty and possible drastic inter-observer variation or inconsistency. In this paper, we present a looped deep pseudo-task optimization procedure for automatic category discovery of visually coherent and clinically semantic (concept) clusters. Our system can be initialized by domain-specific (CNN trained on radiology images and text report derived labels) or generic (ImageNet based) CNN models. Afterwards, a sequence of pseudo-tasks are exploited by the looped deep image feature clustering (to refine image labels) and deep CNN training/classification using new labels (to obtain more task representative deep features). Our method is conceptually simple and based on the hypothesized "convergence" of better labels leading to better trained CNN models which in turn feed more effective deep image features to facilitate more meaningful clustering/labels. We have empirically validated the convergence and demonstrated promising quantitative and qualitative results. Category labels of significantly higher quality than those in previous work are discovered. This allows for further investigation of the hierarchical semantic nature of the given large-scale radiology image database.

CVFeb 10, 2016
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, Mingchen Gao et al.

Remarkable progress has been made in image recognition, primarily due to the availability of large-scale annotated datasets and the revival of deep CNN. CNNs enable learning data-driven, highly representative, layered hierarchical image features from sufficient training data. However, obtaining datasets as comprehensively annotated as ImageNet in the medical imaging domain remains a challenge. There are currently three major techniques that successfully employ CNNs to medical image classification: training the CNN from scratch, using off-the-shelf pre-trained CNN features, and conducting unsupervised CNN pre-training with supervised fine-tuning. Another effective method is transfer learning, i.e., fine-tuning CNN models pre-trained from natural image dataset to medical image tasks. In this paper, we exploit three important, but previously understudied factors of employing deep convolutional neural networks to computer-aided detection problems. We first explore and evaluate different CNN architectures. The studied models contain 5 thousand to 160 million parameters, and vary in numbers of layers. We then evaluate the influence of dataset scale and spatial image context on performance. Finally, we examine when and why transfer learning from pre-trained ImageNet (via fine-tuning) can be useful. We study two specific computer-aided detection (CADe) problems, namely thoraco-abdominal lymph node (LN) detection and interstitial lung disease (ILD) classification. We achieve the state-of-the-art performance on the mediastinal LN detection, with 85% sensitivity at 3 false positive per patient, and report the first five-fold cross-validation classification results on predicting axial CT slices with ILD categories. Our extensive empirical evaluation, CNN model analysis and valuable insights can be extended to the design of high performance CAD systems for other medical imaging tasks.

CVFeb 1, 2016
Improving Vertebra Segmentation through Joint Vertebra-Rib Atlases

Yinong Wang, Jianhua Yao, Holger R. Roth et al.

Accurate spine segmentation allows for improved identification and quantitative characterization of abnormalities of the vertebra, such as vertebral fractures. However, in existing automated vertebra segmentation methods on computed tomography (CT) images, leakage into nearby bones such as ribs occurs due to the close proximity of these visibly intense structures in a 3D CT volume. To reduce this error, we propose the use of joint vertebra-rib atlases to improve the segmentation of vertebrae via multi-atlas joint label fusion. Segmentation was performed and evaluated on CTs containing 106 thoracic and lumbar vertebrae from 10 pathological and traumatic spine patients on an individual vertebra level basis. Vertebra atlases produced errors where the segmentation leaked into the ribs. The use of joint vertebra-rib atlases produced a statistically significant increase in the Dice coefficient from 92.5 $\pm$ 3.1% to 93.8 $\pm$ 2.1% for the left and right transverse processes and a decrease in the mean and max surface distance from 0.75 $\pm$ 0.60mm and 8.63 $\pm$ 4.44mm to 0.30 $\pm$ 0.27mm and 3.65 $\pm$ 2.87mm, respectively.

CVJan 29, 2016
Deep convolutional networks for automated detection of posterior-element fractures on spine CT

Holger R. Roth, Yinong Wang, Jianhua Yao et al.

Injuries of the spine, and its posterior elements in particular, are a common occurrence in trauma patients, with potentially devastating consequences. Computer-aided detection (CADe) could assist in the detection and classification of spine fractures. Furthermore, CAD could help assess the stability and chronicity of fractures, as well as facilitate research into optimization of treatment paradigms. In this work, we apply deep convolutional networks (ConvNets) for the automated detection of posterior element fractures of the spine. First, the vertebra bodies of the spine with its posterior elements are segmented in spine CT using multi-atlas label fusion. Then, edge maps of the posterior elements are computed. These edge maps serve as candidate regions for predicting a set of probabilities for fractures along the image edges using ConvNets in a 2.5D fashion (three orthogonal patches in axial, coronal and sagittal planes). We explore three different methods for training the ConvNet using 2.5D patches along the edge maps of 'positive', i.e. fractured posterior-elements and 'negative', i.e. non-fractured elements. An experienced radiologist retrospectively marked the location of 55 displaced posterior-element fractures in 18 trauma patients. We randomly split the data into training and testing cases. In testing, we achieve an area-under-the-curve of 0.857. This corresponds to 71% or 81% sensitivities at 5 or 10 false-positives per patient, respectively. Analysis of our set of trauma patients demonstrates the feasibility of detecting posterior-element fractures in spine CT images using computer vision techniques such as deep convolutional networks.

CVJan 27, 2016
Osteoporotic and Neoplastic Compression Fracture Classification on Longitudinal CT

Yinong Wang, Jianhua Yao, Joseph E. Burns et al.

Classification of vertebral compression fractures (VCF) having osteoporotic or neoplastic origin is fundamental to the planning of treatment. We developed a fracture classification system by acquiring quantitative morphologic and bone density determinants of fracture progression through the use of automated measurements from longitudinal studies. A total of 250 CT studies were acquired for the task, each having previously identified VCFs with osteoporosis or neoplasm. Thirty-six features or each identified VCF were computed and classified using a committee of support vector machines. Ten-fold cross validation on 695 identified fractured vertebrae showed classification accuracies of 0.812, 0.665, and 0.820 for the measured, longitudinal, and combined feature sets respectively.

CVJan 13, 2016
Multi-Atlas Segmentation with Joint Label Fusion of Osteoporotic Vertebral Compression Fractures on CT

Yinong Wang, Jianhua Yao, Holger R. Roth et al.

The precise and accurate segmentation of the vertebral column is essential in the diagnosis and treatment of various orthopedic, neurological, and oncological traumas and pathologies. Segmentation is especially challenging in the presence of pathology such as vertebral compression fractures. In this paper, we propose a method to produce segmentations for osteoporotic compression fractured vertebrae by applying a multi-atlas joint label fusion technique for clinical CT images. A total of 170 thoracic and lumbar vertebrae were evaluated using atlases from five patients with varying degrees of spinal degeneration. In an osteoporotic cohort of bundled atlases, registration provided an average Dice coefficient and mean absolute surface distance of 2.7$\pm$4.5% and 0.32$\pm$0.13mm for osteoporotic vertebrae, respectively, and 90.9$\pm$3.0% and 0.36$\pm$0.11mm for compression fractured vertebrae.

CVMay 12, 2015
Improving Computer-aided Detection using Convolutional Neural Networks and Random View Aggregation

Holger R. Roth, Le Lu, Jiamin Liu et al.

Automated computer-aided detection (CADe) in medical imaging has been an important tool in clinical practice and research. State-of-the-art methods often show high sensitivities but at the cost of high false-positives (FP) per patient rates. We design a two-tiered coarse-to-fine cascade framework that first operates a candidate generation system at sensitivities of $\sim$100% but at high FP levels. By leveraging existing CAD systems, coordinates of regions or volumes of interest (ROI or VOI) for lesion candidates are generated in this step and function as input for a second tier, which is our focus in this study. In this second stage, we generate $N$ 2D (two-dimensional) or 2.5D views via sampling through scale transformations, random translations and rotations with respect to each ROI's centroid coordinates. These random views are used to train deep convolutional neural network (ConvNet) classifiers. In testing, the trained ConvNets are employed to assign class (e.g., lesion, pathology) probabilities for a new set of $N$ random views that are then averaged at each ROI to compute a final per-candidate classification probability. This second tier behaves as a highly selective process to reject difficult false positives while preserving high sensitivities. The methods are evaluated on three different data sets with different numbers of patients: 59 patients for sclerotic metastases detection, 176 patients for lymph node detection, and 1,186 patients for colonic polyp detection. Experimental results show the ability of ConvNets to generalize well to different medical imaging CADe applications and scale elegantly to various data sets. Our proposed methods improve CADe performance markedly in all cases. CADe sensitivities improved from 57% to 70%, from 43% to 77% and from 58% to 75% at 3 FPs per patient for sclerotic metastases, lymph nodes and colonic polyps, respectively.

CVMay 4, 2015
Interleaved Text/Image Deep Mining on a Large-Scale Radiology Database for Automated Image Interpretation

Hoo-Chang Shin, Le Lu, Lauren Kim et al.

Despite tremendous progress in computer vision, there has not been an attempt for machine learning on very large-scale medical image databases. We present an interleaved text/image deep learning system to extract and mine the semantic interactions of radiology images and reports from a national research hospital's Picture Archiving and Communication System. With natural language processing, we mine a collection of representative ~216K two-dimensional key images selected by clinicians for diagnostic reference, and match the images with their descriptions in an automated manner. Our system interleaves between unsupervised learning and supervised learning on document- and sentence-level text collections, to generate semantic labels and to predict them given an image. Given an image of a patient scan, semantic topics in radiology levels are predicted, and associated key-words are generated. Also, a number of frequent disease types are detected as present or absent, to provide more specific interpretation of a patient scan. This shows the potential of large-scale learning and prediction in electronic patient records available in most modern clinical institutions.

CVApr 15, 2015
Anatomy-specific classification of medical images using deep convolutional nets

Holger R. Roth, Christopher T. Lee, Hoo-Chang Shin et al.

Automated classification of human anatomy is an important prerequisite for many computer-aided diagnosis systems. The spatial complexity and variability of anatomy throughout the human body makes classification difficult. "Deep learning" methods such as convolutional networks (ConvNets) outperform other state-of-the-art methods in image classification tasks. In this work, we present a method for organ- or body-part-specific anatomical classification of medical images acquired using computed tomography (CT) with ConvNets. We train a ConvNet, using 4,298 separate axial 2D key-images to learn 5 anatomical classes. Key-images were mined from a hospital PACS archive, using a set of 1,675 patients. We show that a data augmentation approach can help to enrich the data set and improve classification performance. Using ConvNets and data augmentation, we achieve anatomy-specific classification error of 5.9 % and area-under-the-curve (AUC) values of an average of 0.998 in testing. We demonstrate that deep learning can be used to train very reliable and accurate classifiers that could initialize further computer-aided diagnosis.

CVJul 22, 2014
Detection of Sclerotic Spine Metastases via Random Aggregation of Deep Convolutional Neural Network Classifications

Holger R. Roth, Jianhua Yao, Le Lu et al.

Automated detection of sclerotic metastases (bone lesions) in Computed Tomography (CT) images has potential to be an important tool in clinical practice and research. State-of-the-art methods show performance of 79% sensitivity or true-positive (TP) rate, at 10 false-positives (FP) per volume. We design a two-tiered coarse-to-fine cascade framework to first operate a highly sensitive candidate generation system at a maximum sensitivity of ~92% but with high FP level (~50 per patient). Regions of interest (ROI) for lesion candidates are generated in this step and function as input for the second tier. In the second tier we generate N 2D views, via scale, random translations, and rotations with respect to each ROI centroid coordinates. These random views are used to train a deep Convolutional Neural Network (CNN) classifier. In testing, the CNN is employed to assign individual probabilities for a new set of N random views that are averaged at each ROI to compute a final per-candidate classification probability. This second tier behaves as a highly selective process to reject difficult false positives while preserving high sensitivities. We validate the approach on CT images of 59 patients (49 with sclerotic metastases and 10 normal controls). The proposed method reduces the number of FP/vol. from 4 to 1.2, 7 to 3, and 12 to 9.5 when comparing a sensitivity rates of 60%, 70%, and 80% respectively in testing. The Area-Under-the-Curve (AUC) is 0.834. The results show marked improvement upon previous work.