IVOct 5, 2023
FNOSeg3D: Resolution-Robust 3D Image Segmentation with Fourier Neural OperatorKen C. L. Wong, Hongzhi Wang, Tanveer Syeda-Mahmood
Due to the computational complexity of 3D medical image segmentation, training with downsampled images is a common remedy for out-of-memory errors in deep learning. Nevertheless, as standard spatial convolution is sensitive to variations in image resolution, the accuracy of a convolutional neural network trained with downsampled images can be suboptimal when applied on the original resolution. To address this limitation, we introduce FNOSeg3D, a 3D segmentation model robust to training image resolution based on the Fourier neural operator (FNO). The FNO is a deep learning framework for learning mappings between functions in partial differential equations, which has the appealing properties of zero-shot super-resolution and global receptive field. We improve the FNO by reducing its parameter requirement and enhancing its learning capability through residual connections and deep supervision, and these result in our FNOSeg3D model which is parameter efficient and resolution robust. When tested on the BraTS'19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters.
IVOct 5, 2023
HartleyMHA: Self-Attention in Frequency Domain for Resolution-Robust and Parameter-Efficient 3D Image SegmentationKen C. L. Wong, Hongzhi Wang, Tanveer Syeda-Mahmood
With the introduction of Transformers, different attention-based models have been proposed for image segmentation with promising results. Although self-attention allows capturing of long-range dependencies, it suffers from a quadratic complexity in the image size especially in 3D. To avoid the out-of-memory error during training, input size reduction is usually required for 3D segmentation, but the accuracy can be suboptimal when the trained models are applied on the original image size. To address this limitation, inspired by the Fourier neural operator (FNO), we introduce the HartleyMHA model which is robust to training image resolution with efficient self-attention. FNO is a deep learning framework for learning mappings between functions in partial differential equations, which has the appealing properties of zero-shot super-resolution and global receptive field. We modify the FNO by using the Hartley transform with shared parameters to reduce the model size by orders of magnitude, and this allows us to further apply self-attention in the frequency domain for more expressive high-order feature combination with improved efficiency. When tested on the BraTS'19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters.
IVNov 21, 2023
Image-Based Soil Organic Carbon Remote Sensing from Satellite Images with Fourier Neural Operator and Structural SimilarityKen C. L. Wong, Levente Klein, Ademir Ferreira da Silva et al.
Soil organic carbon (SOC) sequestration is the transfer and storage of atmospheric carbon dioxide in soils, which plays an important role in climate change mitigation. SOC concentration can be improved by proper land use, thus it is beneficial if SOC can be estimated at a regional or global scale. As multispectral satellite data can provide SOC-related information such as vegetation and soil properties at a global scale, estimation of SOC through satellite data has been explored as an alternative to manual soil sampling. Although existing studies show promising results, they are mainly based on pixel-based approaches with traditional machine learning methods, and convolutional neural networks (CNNs) are uncommon. To study the use of CNNs on SOC remote sensing, here we propose the FNO-DenseNet based on the Fourier neural operator (FNO). By combining the advantages of the FNO and DenseNet, the FNO-DenseNet outperformed the FNO in our experiments with hundreds of times fewer parameters. The FNO-DenseNet also outperformed a pixel-based random forest by 18% in the mean absolute percentage error.
LGSep 24, 2024
Modern Hopfield Networks meet Encoded Neural Representations -- Addressing Practical ConsiderationsSatyananda Kashyap, Niharika S. D'Souza, Luyao Shi et al.
Content-addressable memories such as Modern Hopfield Networks (MHN) have been studied as mathematical models of auto-association and storage/retrieval in the human declarative memory, yet their practical use for large-scale content storage faces challenges. Chief among them is the occurrence of meta-stable states, particularly when handling large amounts of high dimensional content. This paper introduces Hopfield Encoding Networks (HEN), a framework that integrates encoded neural representations into MHNs to improve pattern separability and reduce meta-stable states. We show that HEN can also be used for retrieval in the context of hetero association of images with natural language queries, thus removing the limitation of requiring access to partial content in the same domain. Experimental results demonstrate substantial reduction in meta-stable states and increased storage capacity while still enabling perfect recall of a significantly larger number of inputs advancing the practical utility of associative memory networks for real-world tasks.
AIMar 10
Agentic Control Center for Data Product OptimizationPriyadarshini Tamilselvan, Gregory Bramble, Sola Shirai et al.
Data products enable end users to gain greater insights about their data by providing supporting assets, such as example question-SQL pairs which can be answered using the data or views over the database tables. However, producing useful data products is challenging, and typically requires domain experts to hand-craft supporting assets. We propose a system that automates data product improvement through specialized AI agents operating in a continuous optimization loop. By surfacing questions, monitoring multi-dimensional quality metrics, and supporting human-in-the-loop controls, it transforms data into observable and refinable assets that balance automation with trust and oversight.
CVJun 21, 2019Code
Building a Benchmark Dataset and Classifiers for Sentence-Level Findings in AP Chest X-raysTanveer Syeda-Mahmood, Hassan M. Ahmad, Nadeem Ansari et al.
Chest X-rays are the most common diagnostic exams in emergency rooms and hospitals. There has been a surge of work on automatic interpretation of chest X-rays using deep learning approaches after the availability of large open source chest X-ray dataset from NIH. However, the labels are not sufficiently rich and descriptive for training classification tools. Further, it does not adequately address the findings seen in Chest X-rays taken in anterior-posterior (AP) view which also depict the placement of devices such as central vascular lines and tubes. In this paper, we present a new chest X-ray benchmark database of 73 rich sentence-level descriptors of findings seen in AP chest X-rays. We describe our method of obtaining these findings through a semi-automated ground truth generation process from crowdsourcing of clinician annotations. We also present results of building classifiers for these findings that show that such higher granularity labels can also be learned through the framework of deep learning classifiers.
CVSep 20, 2025
Phrase-grounded Fact-checking for Automatically Generated Chest X-ray ReportsRazi Mahmood, Diego Machado-Reyes, Joy Wu et al. · berkeley
With the emergence of large-scale vision language models (VLM), it is now possible to produce realistic-looking radiology reports for chest X-ray images. However, their clinical translation has been hampered by the factual errors and hallucinations in the produced descriptions during inference. In this paper, we present a novel phrase-grounded fact-checking model (FC model) that detects errors in findings and their indicated locations in automatically generated chest radiology reports. Specifically, we simulate the errors in reports through a large synthetic dataset derived by perturbing findings and their locations in ground truth reports to form real and fake findings-location pairs with images. A new multi-label cross-modal contrastive regression network is then trained on this dataset. We present results demonstrating the robustness of our method in terms of accuracy of finding veracity prediction and localization on multiple X-ray datasets. We also show its effectiveness for error detection in reports of SOTA report generators on multiple datasets achieving a concordance correlation coefficient of 0.997 with ground truth-based verification, thus pointing to its utility during clinical inference in radiology workflows.
CVJul 10, 2025
HNOSeg-XS: Extremely Small Hartley Neural Operator for Efficient and Resolution-Robust 3D Image SegmentationKen C. L. Wong, Hongzhi Wang, Tanveer Syeda-Mahmood
In medical image segmentation, convolutional neural networks (CNNs) and transformers are dominant. For CNNs, given the local receptive fields of convolutional layers, long-range spatial correlations are captured through consecutive convolutions and pooling. However, as the computational cost and memory footprint can be prohibitively large, 3D models can only afford fewer layers than 2D models with reduced receptive fields and abstract levels. For transformers, although long-range correlations can be captured by multi-head attention, its quadratic complexity with respect to input size is computationally demanding. Therefore, either model may require input size reduction to allow more filters and layers for better segmentation. Nevertheless, given their discrete nature, models trained with patch-wise training or image downsampling may produce suboptimal results when applied on higher resolutions. To address this issue, here we propose the resolution-robust HNOSeg-XS architecture. We model image segmentation by learnable partial differential equations through the Fourier neural operator which has the zero-shot super-resolution property. By replacing the Fourier transform by the Hartley transform and reformulating the problem in the frequency domain, we created the HNOSeg-XS model, which is resolution robust, fast, memory efficient, and extremely parameter efficient. When tested on the BraTS'23, KiTS'23, and MVSeg'23 datasets with a Tesla V100 GPU, HNOSeg-XS showed its superior resolution robustness with fewer than 34.7k model parameters. It also achieved the overall best inference time (< 0.24 s) and memory efficiency (< 1.8 GiB) compared to the tested CNN and transformer models.
IVJan 10, 2022
3D Segmentation with Fully Trainable Gabor Kernels and Pearson's Correlation CoefficientKen C. L. Wong, Mehdi Moradi
The convolutional layer and loss function are two fundamental components in deep learning. Because of the success of conventional deep learning kernels, the less versatile Gabor kernels become less popular despite the fact that they can provide abundant features at different frequencies, orientations, and scales with much fewer parameters. For existing loss functions for multi-class image segmentation, there is usually a tradeoff among accuracy, robustness to hyperparameters, and manual weight selections for combining different losses. Therefore, to gain the benefits of using Gabor kernels while keeping the advantage of automatic feature generation in deep learning, we propose a fully trainable Gabor-based convolutional layer where all Gabor parameters are trainable through backpropagation. Furthermore, we propose a loss function based on the Pearson's correlation coefficient, which is accurate, robust to learning rates, and does not require manual weight selections. Experiments on 43 3D brain magnetic resonance images with 19 anatomical structures show that, using the proposed loss function with a proper combination of conventional and Gabor-based kernels, we can train a network with only 1.6 million parameters to achieve an average Dice coefficient of 83%. This size is 44 times smaller than the original V-Net which has 71 million parameters. This paper demonstrates the potentials of using learnable parametric kernels in deep learning for 3D segmentation.
LGDec 10, 2021
Addressing Deep Learning Model Uncertainty in Long-Range Climate Forecasting with Late FusionKen C. L. Wong, Hongzhi Wang, Etienne E. Vos et al.
Global warming leads to the increase in frequency and intensity of climate extremes that cause tremendous loss of lives and property. Accurate long-range climate prediction allows more time for preparation and disaster risk management for such extreme events. Although machine learning approaches have shown promising results in long-range climate forecasting, the associated model uncertainties may reduce their reliability. To address this issue, we propose a late fusion approach that systematically combines the predictions from multiple models to reduce the expected errors of the fused results. We also propose a network architecture with the novel denormalization layer to gain the benefits of data normalization without actually normalizing the data. The experimental results on long-range 2m temperature forecasting show that the framework outperforms the 30-year climate normals, and the accuracy can be improved by increasing the number of models.
CVAug 6, 2021
Basis Scaling and Double Pruning for Efficient Inference in Network-Based Transfer LearningKen C. L. Wong, Satyananda Kashyap, Mehdi Moradi
Network-based transfer learning allows the reuse of deep learning features with limited data, but the resulting models can be unnecessarily large. Although network pruning can improve inference efficiency, existing algorithms usually require fine-tuning that may not be suitable for small datasets. In this paper, using the singular value decomposition, we decompose a convolutional layer into two layers: a convolutional layer with the orthonormal basis vectors as the filters, and a "BasisScalingConv" layer which is responsible for rescaling the features and transforming them back to the original space. As the filters in each decomposed layer are linearly independent, when using the proposed basis scaling factors with the Taylor approximation of importance, pruning can be more effective and fine-tuning individual weights is unnecessary. Furthermore, as the numbers of input and output channels of the original convolutional layer remain unchanged after basis pruning, it is applicable to virtually all architectures and can be combined with existing pruning algorithms for double pruning to further increase the pruning capability. When transferring knowledge from ImageNet pre-trained models to different target domains, with less than 1% reduction in classification accuracies, we can achieve pruning ratios up to 74.6% for CIFAR-10 and 98.9% for MNIST in model parameters.
IVMar 23, 2021
Multiview and Multiclass Image Segmentation using Deep Learning in Fetal EchocardiographyKen C. L. Wong, Elena S. Sinkovskaya, Alfred Z. Abuhamad et al.
Congenital heart disease (CHD) is the most common congenital abnormality associated with birth defects in the United States. Despite training efforts and substantial advancement in ultrasound technology over the past years, CHD remains an abnormality that is frequently missed during prenatal ultrasonography. Therefore, computer-aided detection of CHD can play a critical role in prenatal care by improving screening and diagnosis. Since many CHDs involve structural abnormalities, automatic segmentation of anatomical structures is an important step in the analysis of fetal echocardiograms. While existing methods mainly focus on the four-chamber view with a small number of structures, here we present a more comprehensive deep learning segmentation framework covering 14 anatomical structures in both three-vessel trachea and four-chamber views. Specifically, our framework enhances the V-Net with spatial dropout, group normalization, and deep supervision to train a segmentation model that can be applied on both views regardless of abnormalities. By identifying the pitfall of using the Dice loss when some labels are unavailable in some images, this framework integrates information from multiple views and is robust to missing structures due to anatomical anomalies, achieving an average Dice score of 79%.
CVMar 22, 2021
Channel Scaling: A Scale-and-Select Approach for Transfer LearningKen C. L. Wong, Satyananda Kashyap, Mehdi Moradi
Transfer learning with pre-trained neural networks is a common strategy for training classifiers in medical image analysis. Without proper channel selections, this often results in unnecessarily large models that hinder deployment and explainability. In this paper, we propose a novel approach to efficiently build small and well performing networks by introducing the channel-scaling layers. A channel-scaling layer is attached to each frozen convolutional layer, with the trainable scaling weights inferring the importance of the corresponding feature channels. Unlike the fine-tuning approaches, we maintain the weights of the original channels and large datasets are not required. By imposing L1 regularization and thresholding on the scaling weights, this framework iteratively removes unnecessary feature channels from a pre-trained model. Using an ImageNet pre-trained VGG16 model, we demonstrate the capabilities of the proposed framework on classifying opacity from chest X-ray images. The results show that we can reduce the number of parameters by 95% while delivering a superior performance.
CVAug 2, 2020
Looking in the Right place for Anomalies: Explainable AI through Automatic Location LearningSatyananda Kashyap, Alexandros Karargyris, Joy Wu et al.
Deep learning has now become the de facto approach to the recognition of anomalies in medical imaging. Their 'black box' way of classifying medical images into anomaly labels poses problems for their acceptance, particularly with clinicians. Current explainable AI methods offer justifications through visualizations such as heat maps but cannot guarantee that the network is focusing on the relevant image region fully containing the anomaly. In this paper, we develop an approach to explainable AI in which the anomaly is assured to be overlapping the expected location when present. This is made possible by automatically extracting location-specific labels from textual reports and learning the association of expected locations to labels using a hybrid combination of Bi-Directional Long Short-Term Memory Recurrent Neural Networks (Bi-LSTM) and DenseNet-121. Use of this expected location to bias the subsequent attention-guided inference network based on ResNet101 results in the isolation of the anomaly at the expected location when present. The method is evaluated on a large chest X-ray dataset.
CVJul 27, 2020
Chest X-ray Report Generation through Fine-Grained Label LearningTanveer Syeda-Mahmood, Ken C. L. Wong, Yaniv Gur et al.
Obtaining automated preliminary read reports for common exams such as chest X-rays will expedite clinical workflows and improve operational efficiencies in hospitals. However, the quality of reports generated by current automated approaches is not yet clinically acceptable as they cannot ensure the correct detection of a broad spectrum of radiographic findings nor describe them accurately in terms of laterality, anatomical location, severity, etc. In this work, we present a domain-aware automatic chest X-ray radiology report generation algorithm that learns fine-grained description of findings from images and uses their pattern of occurrences to retrieve and customize similar reports from a large report database. We also develop an automatic labeling algorithm for assigning such descriptors to images and build a novel deep learning network that recognizes both coarse and fine-grained descriptions of findings. The resulting report generation algorithm significantly outperforms the state of the art using established score metrics.
IVSep 12, 2019
SegNAS3D: Network Architecture Search with Derivative-Free Global Optimization for 3D Image SegmentationKen C. L. Wong, Mehdi Moradi
Deep learning has largely reduced the need for manual feature selection in image segmentation. Nevertheless, network architecture optimization and hyperparameter tuning are mostly manual and time consuming. Although there are increasing research efforts on network architecture search in computer vision, most works concentrate on image classification but not segmentation, and there are very limited efforts on medical image segmentation especially in 3D. To remedy this, here we propose a framework, SegNAS3D, for network architecture search of 3D image segmentation. In this framework, a network architecture comprises interconnected building blocks that consist of operations such as convolution and skip connection. By representing the block structure as a learnable directed acyclic graph, hyperparameters such as the number of feature channels and the option of using deep supervision can be learned together through derivative-free global optimization. Experiments on 43 3D brain magnetic resonance images with 19 structures achieved an average Dice coefficient of 82%. Each architecture search required less than three days on three GPUs and produced architectures that were much smaller than the state-of-the-art manually created architectures.
IVJul 2, 2019
Automated Detection and Type Classification of Central Venous Catheters in Chest X-RaysVaishnavi Subramanian, Hongzhi Wang, Joy T. Wu et al.
Central venous catheters (CVCs) are commonly used in critical care settings for monitoring body functions and administering medications. They are often described in radiology reports by referring to their presence, identity and placement. In this paper, we address the problem of automatic detection of their presence and identity through automated segmentation using deep learning networks and classification based on their intersection with previously learned shape priors from clinician annotations of CVCs. The results not only outperform existing methods of catheter detection achieving 85.2% accuracy at 91.6% precision, but also enable high precision (95.2%) classification of catheter types on a large dataset of over 10,000 chest X-rays, presenting a robust and practical solution to this problem.
IVJun 21, 2019
Boosting the rule-out accuracy of deep disease detection using class weight modifiersAlexandros Karargyris, Ken C. L. Wong, Joy T. Wu et al.
In many screening applications, the primary goal of a radiologist or assisting artificial intelligence is to rule out certain findings. The classifiers built for such applications are often trained on large datasets that derive labels from clinical notes written for patients. While the quality of the positive findings described in these notes is often reliable, lack of the mention of a finding does not always rule out the presence of it. This happens because radiologists comment on the patient in the context of the exam, for example focusing on trauma as opposed to chronic disease at emergency rooms. However, this disease finding ambiguity can affect the performance of algorithms. Hence it is critical to model the ambiguity during training. We propose a scheme to apply reasonable class weight modifiers to our loss function for the no mention cases during training. We experiment with two different deep neural network architectures and show that the proposed method results in a large improvement in the performance of the classifiers, specially on negated findings. The baseline performance of a custom-made dilated block network proposed in this paper shows an improvement in comparison with baseline DenseNet-201, while both architectures benefit from the new proposed loss function weighting scheme. Over 200,000 chest X-ray images and three highly common diseases, along with their negated counterparts, are included in this study.
CVApr 2, 2019
Identifying disease-free chest X-ray images with deep transfer learningKen C. L. Wong, Mehdi Moradi, Joy Wu et al.
Chest X-rays (CXRs) are among the most commonly used medical image modalities. They are mostly used for screening, and an indication of disease typically results in subsequent tests. As this is mostly a screening test used to rule out chest abnormalities, the requesting clinicians are often interested in whether a CXR is normal or not. A machine learning algorithm that can accurately screen out even a small proportion of the "real normal" exams out of all requested CXRs would be highly beneficial in reducing the workload for radiologists. In this work, we report a deep neural network trained for classifying CXRs with the goal of identifying a large number of normal (disease-free) images without risking the discharge of sick patients. We use an ImageNet-pretrained Inception-ResNet-v2 model to provide the image features, which are further used to train a model on CXRs labelled by expert radiologists. The probability threshold for classification is optimized for 100% precision for the normal class, ensuring no sick patients are released. At this threshold we report an average recall of 50%. This means that the proposed solution has the potential to cut in half the number of disease-free CXRs examined by radiologists, without risking the discharge of sick patients.
CVAug 31, 2018
3D Segmentation with Exponential Logarithmic Loss for Highly Unbalanced Object SizesKen C. L. Wong, Mehdi Moradi, Hui Tang et al.
With the introduction of fully convolutional neural networks, deep learning has raised the benchmark for medical image segmentation on both speed and accuracy, and different networks have been proposed for 2D and 3D segmentation with promising results. Nevertheless, most networks only handle relatively small numbers of labels (<10), and there are very limited works on handling highly unbalanced object sizes especially in 3D segmentation. In this paper, we propose a network architecture and the corresponding loss function which improve segmentation of very small structures. By combining skip connections and deep supervision with respect to the computational feasibility of 3D segmentation, we propose a fast converging and computationally efficient network architecture for accurate segmentation. Furthermore, inspired by the concept of focal loss, we propose an exponential logarithmic loss which balances the labels not only by their relative sizes but also by their segmentation difficulties. We achieve an average Dice coefficient of 82% on brain segmentation with 20 labels, with the ratio of the smallest to largest object sizes as 0.14%. Less than 100 epochs are required to reach such accuracy, and segmenting a 128x128x128 volume only takes around 0.4 s.
CVAug 15, 2018
Building medical image classifiers with very limited data using segmentation networksKen C. L. Wong, Tanveer Syeda-Mahmood, Mehdi Moradi
Deep learning has shown promising results in medical image analysis, however, the lack of very large annotated datasets confines its full potential. Although transfer learning with ImageNet pre-trained classification models can alleviate the problem, constrained image sizes and model complexities can lead to unnecessary increase in computational cost and decrease in performance. As many common morphological features are usually shared by different classification tasks of an organ, it is greatly beneficial if we can extract such features to improve classification with limited samples. Therefore, inspired by the idea of curriculum learning, we propose a strategy for building medical image classifiers using features from segmentation networks. By using a segmentation network pre-trained on similar data as the classification task, the machine can first learn the simpler shape and structural concepts before tackling the actual classification problem which usually involves more complicated concepts. Using our proposed framework on a 3D three-class brain tumor type classification problem, we achieved 82% accuracy on 191 testing samples with 91 training samples. When applying to a 2D nine-class cardiac semantic level classification problem, we achieved 86% accuracy on 263 testing samples with 108 training samples. Comparisons with ImageNet pre-trained classifiers and classifiers trained from scratch are presented.
CVMay 7, 2018
Building Disease Detection Algorithms with Very Small Numbers of Positive SamplesKen C. L. Wong, Alexandros Karargyris, Tanveer Syeda-Mahmood et al.
Although deep learning can provide promising results in medical image analysis, the lack of very large annotated datasets confines its full potential. Furthermore, limited positive samples also create unbalanced datasets which limit the true positive rates of trained models. As unbalanced datasets are mostly unavoidable, it is greatly beneficial if we can extract useful knowledge from negative samples to improve classification accuracy on limited positive samples. To this end, we propose a new strategy for building medical image analysis pipelines that target disease detection. We train a discriminative segmentation model only on normal images to provide a source of knowledge to be transferred to a disease detection classifier. We show that using the feature maps of a trained segmentation network, deviations from normal anatomy can be learned by a two-class classification network on an extremely unbalanced training dataset with as little as one positive for 17 negative samples. We demonstrate that even though the segmentation network is only trained on normal cardiac computed tomography images, the resulting feature maps can be used to detect pericardial effusion and cardiac septal defects with two-class convolutional classification networks.