CVSep 30, 2024Code
Open-Source Periorbital Segmentation Dataset for Ophthalmic ApplicationsGeorge R. Nahass, Emma Koehler, Nicholas Tomaras et al.
Periorbital segmentation and distance prediction using deep learning allows for the objective quantification of disease state, treatment monitoring, and remote medicine. However, there are currently no reports of segmentation datasets for the purposes of training deep learning models with sub mm accuracy on the regions around the eyes. All images (n=2842) had the iris, sclera, lid, caruncle, and brow segmented by five trained annotators. Here, we validate this dataset through intra and intergrader reliability tests and show the utility of the data in training periorbital segmentation networks. All the annotations are publicly available for free download. Having access to segmentation datasets designed specifically for oculoplastic surgery will permit more rapid development of clinically useful segmentation networks which can be leveraged for periorbital distance prediction and disease classification. In addition to the annotations, we also provide an open-source toolkit for periorbital distance prediction from segmentation masks. The weights of all models have also been open-sourced and are publicly available for use by the community.
CVSep 27, 2024
State-of-the-Art Periorbital Distance Prediction and Disease Classification Using Periorbital FeaturesGeorge R. Nahass, Sasha Hubschman, Jeffrey C. Peterson et al.
Periorbital distances are critical markers for diagnosing and monitoring a range of oculoplastic and craniofacial conditions. Manual measurement, however, is subjective and prone to intergrader variability. Automated methods have been developed but remain limited by standardized imaging requirements, small datasets, and a narrow focus on individual measurements. We developed a segmentation pipeline trained on a domain-specific dataset of healthy eyes and compared its performance against the Segment Anything Model (SAM) and the prior benchmark, PeriorbitAI. Segmentation accuracy was evaluated across multiple disease classes and imaging conditions. We further investigated the use of predicted periorbital distances as features for disease classification under in-distribution (ID) and out-of-distribution (OOD) settings, comparing shallow classifiers, CNNs, and fusion models. Our segmentation model achieved state-of-the-art accuracy across all datasets, with error rates within intergrader variability and superior performance relative to SAM and PeriorbitAI. In classification tasks, models trained on periorbital distances matched CNN performance on ID data (77--78\% accuracy) and substantially outperformed CNNs under OOD conditions (63--68\% accuracy vs. 14\%). Fusion models achieved the highest ID accuracy (80\%) but were sensitive to degraded CNN features under OOD shifts. Segmentation-derived periorbital distances provide robust, explainable features for disease classification and generalize better under domain shift than CNN image classifiers. These results establish a new benchmark for periorbital distance prediction and highlight the potential of anatomy-based AI pipelines for real-world deployment in oculoplastic and craniofacial care.
AINov 15, 2025
UpBench: A Dynamically Evolving Real-World Labor-Market Agentic Benchmark Framework Built for Human-Centric AIDarvin Yi, Teng Liu, Mattie Terzolo et al.
As large language model (LLM) agents increasingly undertake digital work, reliable frameworks are needed to evaluate their real-world competence, adaptability, and capacity for human collaboration. Existing benchmarks remain largely static, synthetic, or domain-limited, providing limited insight into how agents perform in dynamic, economically meaningful environments. We introduce UpBench, a dynamically evolving benchmark grounded in real jobs drawn from the global Upwork labor marketplace. Each task corresponds to a verified client transaction, anchoring evaluation in genuine work activity and financial outcomes. UpBench employs a rubric-based evaluation framework, in which expert freelancers decompose each job into detailed, verifiable acceptance criteria and assess AI submissions with per-criterion feedback. This structure enables fine-grained analysis of model strengths, weaknesses, and instruction-following fidelity beyond binary pass/fail metrics. Human expertise is integrated throughout the data pipeline (from job curation and rubric construction to evaluation) ensuring fidelity to real professional standards and supporting research on human-AI collaboration. By regularly refreshing tasks to reflect the evolving nature of online work, UpBench provides a scalable, human-centered foundation for evaluating agentic systems in authentic labor-market contexts, offering a path toward a collaborative framework, where AI amplifies human capability through partnership rather than replacement.
IVNov 7, 2024
Trends, Challenges, and Future Directions in Deep Learning for Glaucoma: A Systematic ReviewMahtab Faraji, Homa Rashidisabet, George R. Nahass et al.
Here, we examine the latest advances in glaucoma detection through Deep Learning (DL) algorithms using Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). This study focuses on three aspects of DL-based glaucoma detection frameworks: input data modalities, processing strategies, and model architectures and applications. Moreover, we analyze trends in employing each aspect since the onset of DL in this field. Finally, we address current challenges and suggest future research directions.
IVMay 28, 2025
Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical ImagesGeorge R. Nahass, Zhu Wang, Homa Rashidisabet et al.
Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings, we recast unlearning as a general-purpose tool for post-deployment model revision. Specifically, we focus on utilizing unlearning in clinical contexts where data shifts, device deprecation, and policy changes are common. To this end, we propose a bilevel optimization formulation of boundary-based unlearning that can be solved using iterative algorithms. We provide convergence guarantees when first-order algorithms are used to unlearn. Our method introduces tunable loss design for controlling the forgetting-retention tradeoff and supports novel model composition strategies that merge the strengths of distinct unlearning runs. Across benchmark and real-world clinical imaging datasets, our approach outperforms baselines on both forgetting and retention metrics, including scenarios involving imaging devices and anatomical outliers. This work establishes machine unlearning as a modular, practical alternative to retraining for real-world model maintenance in clinical applications.
CVOct 29, 2021
CvS: Classification via Segmentation For Small DatasetsNooshin Mojab, Philip S. Yu, Joelle A. Hallak et al.
Deep learning models have shown promising results in a wide range of computer vision applications across various domains. The success of deep learning methods relies heavily on the availability of a large amount of data. Deep neural networks are prone to overfitting when data is scarce. This problem becomes even more severe for neural network with classification head with access to only a few data points. However, acquiring large-scale datasets is very challenging, laborious, or even infeasible in some domains. Hence, developing classifiers that are able to perform well in small data regimes is crucial for applications with limited data. This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps. We employ the label propagation method to achieve a fully segmented dataset with only a handful of manually segmented data. We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
IVJun 7, 2021
AutoPtosisAbdullah Aleem, Manoj Prabhakar Nallabothula, Pete Setabutr et al.
Blepharoptosis, or ptosis as it is more commonly referred to, is a condition of the eyelid where the upper eyelid droops. The current diagnosis for ptosis involves cumbersome manual measurements that are time-consuming and prone to human error. In this paper, we present AutoPtosis, an artificial intelligence based system with interpretable results for rapid diagnosis of ptosis. We utilize a diverse dataset collected from the Illinois Ophthalmic Database Atlas (I-ODA) to develop a robust deep learning model for prediction and also develop a clinically inspired model that calculates the marginal reflex distance and iris ratio. AutoPtosis achieved 95.5% accuracy on physician verified data that had an equal class balance. The proposed algorithm can help in the rapid and timely diagnosis of ptosis, significantly reduce the burden on the healthcare system, and save the patients and clinics valuable resources.
IVMar 30, 2021
I-ODA, Real-World Multi-modal Longitudinal Data for OphthalmicApplicationsNooshin Mojab, Vahid Noroozi, Abdullah Aleem et al.
Data from clinical real-world settings is characterized by variability in quality, machine-type, setting, and source. One of the primary goals of medical computer vision is to develop and validate artificial intelligence (AI) based algorithms on real-world data enabling clinical translations. However, despite the exponential growth in AI based applications in healthcare, specifically in ophthalmology, translations to clinical settings remain challenging. Limited access to adequate and diverse real-world data inhibits the development and validation of translatable algorithms. In this paper, we present a new multi-modal longitudinal ophthalmic imaging dataset, the Illinois Ophthalmic Database Atlas (I-ODA), with the goal of advancing state-of-the-art computer vision applications in ophthalmology, and improving upon the translatable capacity of AI based applications across different clinical settings. We present the infrastructure employed to collect, annotate, and anonymize images from multiple sources, demonstrating the complexity of real-world retrospective data and its limitations. I-ODA includes 12 imaging modalities with a total of 3,668,649 ophthalmic images of 33,876 individuals from the Department of Ophthalmology and Visual Sciences at the Illinois Eye and Ear Infirmary of the University of Illinois Chicago (UIC) over the course of 12 years.
CVJul 24, 2020
Real-World Multi-Domain Data Applications for Generalizations to Clinical SettingsNooshin Mojab, Vahid Noroozi, Darvin Yi et al.
With promising results of machine learning based models in computer vision, applications on medical imaging data have been increasing exponentially. However, generalizations to complex real-world clinical data is a persistent problem. Deep learning models perform well when trained on standardized datasets from artificial settings, such as clinical trials. However, real-world data is different and translations are yielding varying results. The complexity of real-world applications in healthcare could emanate from a mixture of different data distributions across multiple device domains alongside the inevitable noise sourced from varying image resolutions, human errors, and the lack of manual gradings. In addition, healthcare applications not only suffer from the scarcity of labeled data, but also face limited access to unlabeled data due to HIPAA regulations, patient privacy, ambiguity in data ownership, and challenges in collecting data from different sources. These limitations pose additional challenges to applying deep learning algorithms in healthcare and clinical translations. In this paper, we utilize self-supervised representation learning methods, formulated effectively in transfer learning settings, to address limited data availability. Our experiments verify the importance of diverse real-world data for generalization to clinical settings. We show that by employing a self-supervised approach with transfer learning on a multi-domain real-world dataset, we can achieve 16% relative improvement on a standardized dataset over supervised baselines.
CVFeb 23, 2020
Random Bundle: Brain Metastases Segmentation Ensembling through Annotation RandomizationDarvin Yi, Endre Grøvik, Michael Iv et al.
We introduce a novel ensembling method, Random Bundle (RB), that improves performance for brain metastases segmentation. We create our ensemble by training each network on our dataset with 50% of our annotated lesions censored out. We also apply a lopsided bootstrap loss to recover performance after inducing an in silico 50% false negative rate and make our networks more sensitive. We improve our network detection of lesions's mAP value by 39% and more than triple the sensitivity at 80% precision. We also show slight improvements in segmentation quality through DICE score. Further, RB ensembling improves performance over baseline by a larger margin than a variety of popular ensembling strategies. Finally, we show that RB ensembling is computationally efficient by comparing its performance to a single network when both systems are constrained to have the same compute.
CVJan 26, 2020
Brain Metastasis Segmentation Network Trained with Robustness to Annotations with Multiple False NegativesDarvin Yi, Endre Grøvik, Michael Iv et al.
Deep learning has proven to be an essential tool for medical image analysis. However, the need for accurately labeled input data, often requiring time- and labor-intensive annotation by experts, is a major limitation to the use of deep learning. One solution to this challenge is to allow for use of coarse or noisy labels, which could permit more efficient and scalable labeling of images. In this work, we develop a lopsided loss function based on entropy regularization that assumes the existence of a nontrivial false negative rate in the target annotations. Starting with a carefully annotated brain metastasis lesion dataset, we simulate data with false negatives by (1) randomly censoring the annotated lesions and (2) systematically censoring the smallest lesions. The latter better models true physician error because smaller lesions are harder to notice than the larger ones. Even with a simulated false negative rate as high as 50%, applying our loss function to randomly censored data preserves maximum sensitivity at 97% of the baseline with uncensored training data, compared to just 10% for a standard loss function. For the size-based censorship, performance is restored from 17% with the current standard to 88% with our lopsided bootstrap loss. Our work will enable more efficient scaling of the image labeling process, in parallel with other approaches on creating more efficient user interfaces and tools for annotation.
IVDec 27, 2019
Handling Missing MRI Input Data in Deep Learning Segmentation of Brain Metastases: A Multi-Center StudyEndre Grøvik, Darvin Yi, Michael Iv et al.
The purpose was to assess the clinical value of a novel DropOut model for detecting and segmenting brain metastases, in which a neural network is trained on four distinct MRI sequences using an input dropout layer, thus simulating the scenario of missing MRI data by training on the full set and all possible subsets of the input data. This retrospective, multi-center study, evaluated 165 patients with brain metastases. A deep learning based segmentation model for automatic segmentation of brain metastases, named DropOut, was trained on multi-sequence MRI from 100 patients, and validated/tested on 10/55 patients. The segmentation results were compared with the performance of a state-of-the-art DeepLabV3 model. The MR sequences in the training set included pre- and post-gadolinium (Gd) T1-weighted 3D fast spin echo, post-Gd T1-weighted inversion recovery (IR) prepped fast spoiled gradient echo, and 3D fluid attenuated inversion recovery (FLAIR), whereas the test set did not include the IR prepped image-series. The ground truth were established by experienced neuroradiologists. The results were evaluated using precision, recall, Dice score, and receiver operating characteristics (ROC) curve statistics, while the Wilcoxon rank sum test was used to compare the performance of the two neural networks. The area under the ROC curve (AUC), averaged across all test cases, was 0.989+-0.029 for the DropOut model and 0.989+-0.023 for the DeepLabV3 model (p=0.62). The DropOut model showed a significantly higher Dice score compared to the DeepLabV3 model (0.795+-0.105 vs. 0.774+-0.104, p=0.017), and a significantly lower average false positive rate of 3.6/patient vs. 7.0/patient (p<0.001) using a 10mm3 lesion-size limit. The DropOut model may facilitate accurate detection and segmentation of brain metastases on a multi-center basis, even when the test cohort is missing MRI input data.
IVDec 18, 2019
MRI Pulse Sequence Integration for Deep-Learning Based Brain Metastasis SegmentationDarvin Yi, Endre Grøvik, Michael Iv et al.
Magnetic resonance (MR) imaging is an essential diagnostic tool in clinical medicine. Recently, a variety of deep learning methods have been applied to segmentation tasks in medical images, with promising results for computer-aided diagnosis. For MR images, effectively integrating different pulse sequences is important to optimize performance. However, the best way to integrate different pulse sequences remains unclear. In this study, we evaluate multiple architectural features and characterize their effects in the task of metastasis segmentation. Specifically, we consider (1) different pulse sequence integration schemas, (2) different modes of weight sharing for parallel network branches, and (3) a new approach for enabling robustness to missing pulse sequences. We find that levels of integration and modes of weight sharing that favor low variance work best in our regime of small data (n = 100). By adding an input-level dropout layer, we could preserve the overall performance of these networks while allowing for inference on inputs with missing pulse sequence. We illustrate not only the generalizability of the network but also the utility of this robustness when applying the trained model to data from a different center, which does not use the same pulse sequences. Finally, we apply network visualization methods to better understand which input features are most important for network performance. Together, these results provide a framework for building networks with enhanced robustness to missing data while maintaining comparable performance in medical imaging applications.
CVApr 25, 2019
DeepPerimeter: Indoor Boundary Estimation from Posed Monocular SequencesAmeya Phalak, Zhao Chen, Darvin Yi et al.
We present DeepPerimeter, a deep learning based pipeline for inferring a full indoor perimeter (i.e. exterior boundary map) from a sequence of posed RGB images. Our method relies on robust deep methods for depth estimation and wall segmentation to generate an exterior boundary point cloud, and then uses deep unsupervised clustering to fit wall planes to obtain a final boundary map of the room. We demonstrate that DeepPerimeter results in excellent visual and quantitative performance on the popular ScanNet and FloorNet datasets and works for room shapes of various complexities as well as in multiroom scenarios. We also establish important baselines for future work on indoor perimeter estimation, topics which will become increasingly prevalent as application areas like augmented reality and robotics become more significant.
IVMar 18, 2019
Deep Learning Enables Automatic Detection and Segmentation of Brain Metastases on Multi-Sequence MRIEndre Grøvik, Darvin Yi, Michael Iv et al.
Detecting and segmenting brain metastases is a tedious and time-consuming task for many radiologists, particularly with the growing use of multi-sequence 3D imaging. This study demonstrates automated detection and segmentation of brain metastases on multi-sequence MRI using a deep learning approach based on a fully convolution neural network (CNN). In this retrospective study, a total of 156 patients with brain metastases from several primary cancers were included. Pre-therapy MR images (1.5T and 3T) included pre- and post-gadolinium T1-weighted 3D fast spin echo, post-gadolinium T1-weighted 3D axial IR-prepped FSPGR, and 3D fluid attenuated inversion recovery. The ground truth was established by manual delineation by two experienced neuroradiologists. CNN training/development was performed using 100 and 5 patients, respectively, with a 2.5D network based on a GoogLeNet architecture. The results were evaluated in 51 patients, equally separated into those with few (1-3), multiple (4-10), and many (>10) lesions. Network performance was evaluated using precision, recall, Dice/F1 score, and ROC-curve statistics. For an optimal probability threshold, detection and segmentation performance was assessed on a per metastasis basis. The area under the ROC-curve (AUC), averaged across all patients, was 0.98. The AUC in the subgroups was 0.99, 0.97, and 0.97 for patients having 1-3, 4-10, and >10 metastases, respectively. Using an average optimal probability threshold determined by the development set, precision, recall, and Dice-score were 0.79, 0.53, and 0.79, respectively. At the same probability threshold, the network showed an average false positive rate of 8.3/patient (no lesion-size limit) and 3.4/patient (10 mm3 lesion size limit). In conclusion, a deep learning approach using multi-sequence MRI can aid in the detection and segmentation of brain metastases.
NENov 27, 2018
CT organ segmentation using GPU data augmentation, unsupervised labels and IOU lossBlaine Rister, Darvin Yi, Kaushik Shivakumar et al.
Fully-convolutional neural networks have achieved superior performance in a variety of image segmentation tasks. However, their training requires laborious manual annotation of large datasets, as well as acceleration by parallel processors with high-bandwidth memory, such as GPUs. We show that simple models can achieve competitive accuracy for organ segmentation on CT images when trained with extensive data augmentation, which leverages existing graphics hardware to quickly apply geometric and photometric transformations to 3D image data. On 3 mm^3 CT volumes, our GPU implementation is 2.6-8X faster than a widely-used CPU version, including communication overhead. We also show how to automatically generate training labels using rudimentary morphological operations, which are efficiently computed by 3D Fourier transforms. We combined fully-automatic labels for the lungs and bone with semi-automatic ones for the liver, kidneys and bladder, to create a dataset of 130 labeled CT scans. To achieve the best results from data augmentation, our model uses the intersection-over-union (IOU) loss function, a close relative of the Dice loss. We discuss its mathematical properties and explain why it outperforms the usual weighted cross-entropy loss for unbalanced segmentation tasks. We conclude that there is no unique IOU loss function, as the naive one belongs to a broad family of functions with the same essential properties. When combining data augmentation with the IOU loss, our model achieves a Dice score of 78-92% for each organ. The trained model, code and dataset will be made publicly available, to further medical imaging research.
CVSep 10, 2017
Institutionally Distributed Deep Learning NetworksKen Chang, Niranjan Balachandar, Carson K Lam et al.
Deep learning has become a promising approach for automated medical diagnoses. When medical data samples are limited, collaboration among multiple institutions is necessary to achieve high algorithm performance. However, sharing patient data often has limitations due to technical, legal, or ethical concerns. In such cases, sharing a deep learning model is a more attractive alternative. The best method of performing such a task is unclear, however. In this study, we simulate the dissemination of learning deep learning network models across four institutions using various heuristics and compare the results with a deep learning model trained on centrally hosted patient data. The heuristics investigated include ensembling single institution models, single weight transfer, and cyclical weight transfer. We evaluated these approaches for image classification in three independent image collections (retinal fundus photos, mammography, and ImageNet). We find that cyclical weight transfer resulted in a performance (testing accuracy = 77.3%) that was closest to that of centrally hosted patient data (testing accuracy = 78.7%). We also found that there is an improvement in the performance of cyclical weight transfer heuristic with high frequency of weight transfer.
CVMay 17, 2017
Optimizing and Visualizing Deep Learning for Benign/Malignant Classification in Breast TumorsDarvin Yi, Rebecca Lynn Sawyer, David Cohn et al.
Breast cancer has the highest incidence and second highest mortality rate for women in the US. Our study aims to utilize deep learning for benign/malignant classification of mammogram tumors using a subset of cases from the Digital Database of Screening Mammography (DDSM). Though it was a small dataset from the view of Deep Learning (about 1000 patients), we show that currently state of the art architectures of deep learning can find a robust signal, even when trained from scratch. Using convolutional neural networks (CNNs), we are able to achieve an accuracy of 85% and an ROC AUC of 0.91, while leading hand-crafted feature based methods are only able to achieve an accuracy of 71%. We investigate an amalgamation of architectures to show that our best result is reached with an ensemble of the lightweight GoogLe Nets tasked with interpreting both the coronal caudal view and the mediolateral oblique view, simply averaging the probability scores of both views to make the final prediction. In addition, we have created a novel method to visualize what features the neural network detects for the benign/malignant classification, and have correlated those features with well known radiological features, such as spiculation. Our algorithm significantly improves existing classification methods for mammography lesions and identifies features that correlate with established clinical markers.
CVFeb 18, 2017
The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AIZhao Chen, Darvin Yi
We present a vision-only model for gaming AI which uses a late integration deep convolutional network architecture trained in a purely supervised imitation learning context. Although state-of-the-art deep learning models for video game tasks generally rely on more complex methods such as deep-Q learning, we show that a supervised model which requires substantially fewer resources and training time can already perform well at human reaction speeds on the N64 classic game Super Smash Bros. We frame our learning task as a 30-class classification problem, and our CNN model achieves 80% top-1 and 95% top-3 validation accuracy. With slight test-time fine-tuning, our model is also competitive during live simulation with the highest-level AI built into the game. We will further show evidence through network visualizations that the network is successfully leveraging temporal information during inference to aid in decision making. Our work demonstrates that supervised CNN models can provide good performance in challenging policy prediction tasks while being significantly simpler and more lightweight than alternatives.
CVNov 14, 2016
3-D Convolutional Neural Networks for Glioblastoma SegmentationDarvin Yi, Mu Zhou, Zhao Chen et al.
Convolutional Neural Networks (CNN) have emerged as powerful tools for learning discriminative image features. In this paper, we propose a framework of 3-D fully CNN models for Glioblastoma segmentation from multi-modality MRI data. By generalizing CNN models to true 3-D convolutions in learning 3-D tumor MRI data, the proposed approach utilizes a unique network architecture to decouple image pixels. Specifically, we design a convolutional layer with pre-defined Difference- of-Gaussian (DoG) filters to perform true 3-D convolution incorporating local neighborhood information at each pixel. We then use three trained convolutional layers that act to decouple voxels from the initial 3-D convolution. The proposed framework allows identification of high-level tumor structures on MRI. We evaluate segmentation performance on the BRATS segmentation dataset with 274 tumor samples. Extensive experimental results demonstrate encouraging performance of the proposed approach comparing to the state-of-the-art methods. Our data-driven approach achieves a median Dice score accuracy of 89% in whole tumor glioblastoma segmentation, revealing a generalized low-bias possibility to learn from medium-size MRI datasets.