Arie Kaufman

h-index54

17papers

196citations

Novelty49%

AI Score34

Ranked #112,256 of 194,257 authors (top 58%)#37,494 in CV (top 63%)

17 Papers

9.5IVJun 29, 2022Code

CLTS-GAN: Color-Lighting-Texture-Specular Reflection Augmentation for Colonoscopy

Shawn Mathew, Saad Nadeem, Arie Kaufman

Automated analysis of optical colonoscopy (OC) video frames (to assist endoscopists during OC) is challenging due to variations in color, lighting, texture, and specular reflections. Previous methods either remove some of these variations via preprocessing (making pipelines cumbersome) or add diverse training data with annotations (but expensive and time-consuming). We present CLTS-GAN, a new deep learning model that gives fine control over color, lighting, texture, and specular reflection synthesis for OC video frames. We show that adding these colonoscopy-specific augmentations to the training data can improve state-of-the-art polyp detection/segmentation methods as well as drive next generation of OC simulators for training medical students. The code and pre-trained models for CLTS-GAN are available on Computational Endoscopy Platform GitHub (https://github.com/nadeemlab/CEP).

1.5CVOct 2, 2023Code

RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches

Shawn Mathew, Saad Nadeem, Alvin C. Goh et al.

Fourteen million colonoscopies are performed annually just in the U.S. However, the videos from these colonoscopies are not saved due to storage constraints (each video from a high-definition colonoscope camera can be in tens of gigabytes). Instead, a few relevant individual frames are saved for documentation/reporting purposes and these are the frames on which most current colonoscopy AI models are trained on. While developing new unsupervised domain translation methods for colonoscopy (e.g. to translate between real optical and virtual/CT colonoscopy), it is thus typical to start with approaches that initially work for individual frames without temporal consistency. Once an individual-frame model has been finalized, additional contiguous frames are added with a modified deep learning architecture to train a new model from scratch for temporal consistency. This transition to temporally-consistent deep learning models, however, requires significantly more computational and memory resources for training. In this paper, we present a lightweight solution with a tunable temporal parameter, RT-GAN (Recurrent Temporal GAN), for adding temporal consistency to individual frame-based approaches that reduces training requirements by a factor of 5. We demonstrate the effectiveness of our approach on two challenging use cases in colonoscopy: haustral fold segmentation (indicative of missed surface) and realistic colonoscopy simulator video generation. We also release a first-of-its kind temporal dataset for colonoscopy for the above use cases. The datasets, accompanying code, and pretrained models will be made available on our Computational Endoscopy Platform GitHub (https://github.com/nadeemlab/CEP). The supplementary video is available at https://youtu.be/UMVP-uIXwWk.

16.4CVDec 21, 2023Code

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Desai Xie, Jiahao Li, Hao Tan et al.

Multi-view diffusion models, obtained by applying Supervised Finetuning (SFT) to text-to-image diffusion models, have driven recent breakthroughs in text-to-3D research. However, due to the limited size and quality of existing 3D datasets, they still suffer from multi-view inconsistencies and Neural Radiance Field (NeRF) reconstruction artifacts. We argue that multi-view diffusion models can benefit from further Reinforcement Learning Finetuning (RLFT), which allows models to learn from the data generated by themselves and improve beyond their dataset limitations during SFT. To this end, we introduce Carve3D, an improved RLFT algorithm coupled with a novel Multi-view Reconstruction Consistency (MRC) metric, to enhance the consistency of multi-view diffusion models. To measure the MRC metric on a set of multi-view images, we compare them with their corresponding NeRF renderings at the same camera viewpoints. The resulting model, which we denote as Carve3DM, demonstrates superior multi-view consistency and NeRF reconstruction quality than existing models. Our results suggest that pairing SFT with Carve3D's RLFT is essential for developing multi-view-consistent diffusion models, mirroring the standard Large Language Model (LLM) alignment pipeline. Our code, training and testing data, and video results are available at: https://desaixie.github.io/carve-3d.

6.1IVJun 23, 2021Code

FoldIt: Haustral Folds Detection and Segmentation in Colonoscopy Videos

Shawn Mathew, Saad Nadeem, Arie Kaufman

Haustral folds are colon wall protrusions implicated for high polyp miss rate during optical colonoscopy procedures. If segmented accurately, haustral folds can allow for better estimation of missed surface and can also serve as valuable landmarks for registering pre-treatment virtual (CT) and optical colonoscopies, to guide navigation towards the anomalies found in pre-treatment scans. We present a novel generative adversarial network, FoldIt, for feature-consistent image translation of optical colonoscopy videos to virtual colonoscopy renderings with haustral fold overlays. A new transitive loss is introduced in order to leverage ground truth information between haustral fold annotations and virtual colonoscopy renderings. We demonstrate the effectiveness of our model on real challenging optical colonoscopy videos as well as on textured virtual colonoscopy videos with clinician-verified haustral fold annotations. All code and scripts to reproduce the experiments of this paper will be made available via our Computational Endoscopy Platform at https://github.com/nadeemlab/CEP.

19.0HCJan 23, 2025Code

Explainable XR: Understanding User Behaviors of XR Environments using LLM-assisted Analytics Framework

Yoonsang Kim, Zainab Aamir, Mithilesh Singh et al.

We present Explainable XR, an end-to-end framework for analyzing user behavior in diverse eXtended Reality (XR) environments by leveraging Large Language Models (LLMs) for data interpretation assistance. Existing XR user analytics frameworks face challenges in handling cross-virtuality - AR, VR, MR - transitions, multi-user collaborative application scenarios, and the complexity of multimodal data. Explainable XR addresses these challenges by providing a virtuality-agnostic solution for the collection, analysis, and visualization of immersive sessions. We propose three main components in our framework: (1) A novel user data recording schema, called User Action Descriptor (UAD), that can capture the users' multimodal actions, along with their intents and the contexts; (2) a platform-agnostic XR session recorder, and (3) a visual analytics interface that offers LLM-assisted insights tailored to the analysts' perspectives, facilitating the exploration and analysis of the recorded XR session data. We demonstrate the versatility of Explainable XR by demonstrating five use-case scenarios, in both individual and collaborative XR applications across virtualities. Our technical evaluation and user studies show that Explainable XR provides a highly usable analytics solution for understanding user actions and delivering multifaceted, actionable insights into user behaviors in immersive environments.

21.5CVJun 13, 2024Code

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

Desai Xie, Sai Bi, Zhixin Shu et al.

We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D datasets (e.g., Objaverse) which are often captured or crafted by humans to approximate real 3D data, Zeroverse completely ignores realistic global semantics but is rich in complex geometric and texture details that are locally similar to or even more intricate than real objects. We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse. We also analyze several critical design choices of Zeroverse that contribute to LRM-Zero's capability and training stability. Our work demonstrates that 3D reconstruction, one of the core tasks in 3D vision, can potentially be addressed without the semantics of real-world objects. The Zeroverse's procedural synthesis code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/.

2.3GRFeb 21, 2022

Geometry-Aware Planar Embedding of Treelike Structures

Ping Hu, Saeed Boorboor, Joseph Marino et al.

The growing complexity of spatial and structural information in 3D data makes data inspection and visualization a challenging task. We describe a method to create a planar embedding of 3D treelike structures using their skeleton representations. Our method maintains the original geometry, without overlaps, to the best extent possible, allowing exploration of the topology within a single view. We present a novel camera view generation method which maximizes the visible geometric attributes (segment shape and relative placement between segments). Camera views are created for individual segments and are used to determine local bending angles at each node by projecting them to 2D. The final embedding is generated by minimizing an energy function (the weights of which are user adjustable) based on branch length and the 2D angles, while avoiding intersections. The user can also interactively modify segment placement within the 2D embedding, and the overall embedding will update accordingly. A global to local interactive exploration is provided using hierarchical camera views that are created for subtrees within the structure. We evaluate our method both qualitatively and quantitatively and demonstrate our results by constructing planar visualizations of line data (traced neurons) and volume data (CT vascular and bronchial data

7.5IVAug 9, 2021

COVID-view: Diagnosis of COVID-19 using Chest CT

Shreeraj Jadhav, Gaofeng Deng, Marlene Zawin et al.

Significant work has been done towards deep learning (DL) models for automatic lung and lesion segmentation and classification of COVID-19 on chest CT data. However, comprehensive visualization systems focused on supporting the dual visual+DL diagnosis of COVID-19 are non-existent. We present COVID-view, a visualization application specially tailored for radiologists to diagnose COVID-19 from chest CT data. The system incorporates a complete pipeline of automatic lungs segmentation, localization/ isolation of lung abnormalities, followed by visualization, visual and DL analysis, and measurement/quantification tools. Our system combines the traditional 2D workflow of radiologists with newer 2D and 3D visualization techniques with DL support for a more comprehensive diagnosis. COVID-view incorporates a novel DL model for classifying the patients into positive/negative COVID-19 cases, which acts as a reading aid for the radiologist using COVID-view and provides the attention heatmap as an explainable DL for the model output. We designed and evaluated COVID-view through suggestions, close feedback and conducting case studies of real-world patient data by expert radiologists who have substantial experience diagnosing chest CT scans for COVID-19, pulmonary embolism, and other forms of lung infections. We present requirements and task analysis for the diagnosis of COVID-19 that motivate our design choices and results in a practical system which is capable of handling real-world patient cases.

0.9CVOct 21, 2018

Visualization Framework for Colonoscopy Videos

Saad Nadeem, Arie Kaufman

We present a visualization framework for annotating and comparing colonoscopy videos, where these annotations can then be used for semi-automatic report generation at the end of the procedure. Currently, there are approximately 14 million colonoscopies performed every year in the US. In this work, we create a visualization tool to deal with the deluge of colonoscopy videos in a more effective way. We present an interactive visualization framework for the annotation and tagging of colonoscopy videos in an easy and intuitive way. These annotations and tags can later be used for report generation for electronic medical records and for comparison at an individual as well as group level. We also present important use cases and medical expert feedback for our visualization framework.

2.5CVOct 20, 2018

Corresponding Supine and Prone Colon Visualization Using Eigenfunction Analysis and Fold Modeling

Saad Nadeem, Joseph Marino, Xianfeng Gu et al.

We present a method for registration and visualization of corresponding supine and prone virtual colonoscopy scans based on eigenfunction analysis and fold modeling. In virtual colonoscopy, CT scans are acquired with the patient in two positions, and their registration is desirable so that physicians can corroborate findings between scans. Our algorithm performs this registration efficiently through the use of Fiedler vector representation (the second eigenfunction of the Laplace-Beltrami operator). This representation is employed to first perform global registration of the two colon positions. The registration is then locally refined using the haustral folds, which are automatically segmented using the 3D level sets of the Fiedler vector. The use of Fiedler vectors and the segmented folds presents a precise way of visualizing corresponding regions across datasets and visual modalities. We present multiple methods of visualizing the results, including 2D flattened rendering and the corresponding 3D endoluminal views. The precise fold modeling is used to automatically find a suitable cut for the 2D flattening, which provides a less distorted visualization. Our approach is robust, and we demonstrate its efficiency and efficacy by showing matched views on both the 2D flattened colons and in the 3D endoluminal view. We analytically evaluate the results by measuring the distance between features on the registered colons, and we also assess our fold segmentation against 20 manually labeled datasets. We have compared our results analytically to previous methods, and have found our method to achieve superior results. We also prove the hot spots conjecture for modeling cylindrical topology using Fiedler vector representation, which allows our approach to be used for general cylindrical geometry modeling and feature extraction.

0.9CVOct 11, 2018

FeatureLego: Volume Exploration Using Exhaustive Clustering of Super-Voxels

Shreeraj Jadhav, Saad Nadeem, Arie Kaufman

We present a volume exploration framework, FeatureLego, that uses a novel voxel clustering approach for efficient selection of semantic features. We partition the input volume into a set of compact super-voxels that represent the finest selection granularity. We then perform an exhaustive clustering of these super-voxels using a graph-based clustering method. Unlike the prevalent brute-force parameter sampling approaches, we propose an efficient algorithm to perform this exhaustive clustering. By computing an exhaustive set of clusters, we aim to capture as many boundaries as possible and ensure that the user has sufficient options for efficiently selecting semantically relevant features. Furthermore, we merge all the computed clusters into a single tree of meta-clusters that can be used for hierarchical exploration. We implement an intuitive user-interface to interactively explore volumes using our clustering approach. Finally, we show the effectiveness of our framework on multiple real-world datasets of different modalities.

1.7CVSep 17, 2018

LMap: Shape-Preserving Local Mappings for Biomedical Visualization

Saad Nadeem, Xianfeng Gu, Arie Kaufman

Visualization of medical organs and biological structures is a challenging task because of their complex geometry and the resultant occlusions. Global spherical and planar mapping techniques simplify the complex geometry and resolve the occlusions to aid in visualization. However, while resolving the occlusions these techniques do not preserve the geometric context, making them less suitable for mission-critical biomedical visualization tasks. In this paper, we present a shape-preserving local mapping technique for resolving occlusions locally while preserving the overall geometric context. More specifically, we present a novel visualization algorithm, LMap, for conformally parameterizing and deforming a selected local region-of-interest (ROI) on an arbitrary surface. The resultant shape-preserving local mappings help to visualize complex surfaces while preserving the overall geometric context. The algorithm is based on the robust and efficient extrinsic Ricci flow technique, and uses the dynamic Ricci flow algorithm to guarantee the existence of a local map for a selected ROI on an arbitrary surface. We show the effectiveness and efficacy of our method in three challenging use cases: (1) multimodal brain visualization, (2) optimal coverage of virtual colonoscopy centerline flythrough, and (3) molecular surface visualization.

0.9CVSep 17, 2018

Radiative Transport Based Flame Volume Reconstruction from Videos

Liang Shen, Dengming Zhu, Saad Nadeem et al.

We introduce a novel approach for flame volume reconstruction from videos using inexpensive charge-coupled device (CCD) consumer cameras. The approach includes an economical data capture technique using inexpensive CCD cameras. Leveraging the smear feature of the CCD chip, we present a technique for synchronizing CCD cameras while capturing flame videos from different views. Our reconstruction is based on the radiative transport equation which enables complex phenomena such as emission, extinction, and scattering to be used in the rendering process. Both the color intensity and temperature reconstructions are implemented using the CUDA parallel computing framework, which provides real-time performance and allows visualization of reconstruction results after every iteration. We present the results of our approach using real captured data and physically-based simulated data. Finally, we also compare our approach against the other state-of-the-art flame volume reconstruction methods and demonstrate the efficacy and efficiency of our approach in four different applications: (1) rendering of reconstructed flames in virtual environments, (2) rendering of reconstructed flames in augmented reality, (3) flame stylization, and (4) reconstruction of other semitransparent phenomena.

2.5CVSep 17, 2018

Crowd-Assisted Polyp Annotation of Virtual Colonoscopy Videos

Ji Hwan Park, Saad Nadeem, Joseph Marino et al.

Virtual colonoscopy (VC) allows a radiologist to navigate through a 3D colon model reconstructed from a computed tomography scan of the abdomen, looking for polyps, the precursors of colon cancer. Polyps are seen as protrusions on the colon wall and haustral folds, visible in the VC fly-through videos. A complete review of the colon surface requires full navigation from the rectum to the cecum in antegrade and retrograde directions, which is a tedious task that takes an average of 30 minutes. Crowdsourcing is a technique for non-expert users to perform certain tasks, such as image or video annotation. In this work, we use crowdsourcing for the examination of complete VC fly-through videos for polyp annotation by non-experts. The motivation for this is to potentially help the radiologist reach a diagnosis in a shorter period of time, and provide a stronger confirmation of the eventual diagnosis. The crowdsourcing interface includes an interactive tool for the crowd to annotate suspected polyps in the video with an enclosing box. Using our workflow, we achieve an overall polyps-per-patient sensitivity of 87.88% (95.65% for polyps $\geq$5mm and 70% for polyps $<$5mm). We also demonstrate the efficacy and effectiveness of a non-expert user in detecting and annotating polyps and discuss their possibility in aiding radiologists in VC examinations.

3.9CVSep 17, 2018

Crowdsourcing Lung Nodules Detection and Annotation

Saeed Boorboor, Saad Nadeem, Ji Hwan Park et al.

We present crowdsourcing as an additional modality to aid radiologists in the diagnosis of lung cancer from clinical chest computed tomography (CT) scans. More specifically, a complete workflow is introduced which can help maximize the sensitivity of lung nodule detection by utilizing the collective intelligence of the crowd. We combine the concept of overlapping thin-slab maximum intensity projections (TS-MIPs) and cine viewing to render short videos that can be outsourced as an annotation task to the crowd. These videos are generated by linearly interpolating overlapping TS-MIPs of CT slices through the depth of each quadrant of a patient's lung. The resultant videos are outsourced to an online community of non-expert users who, after a brief tutorial, annotate suspected nodules in these video segments. Using our crowdsourcing workflow, we achieved a lung nodule detection sensitivity of over 90% for 20 patient CT datasets (containing 178 lung nodules with sizes between 1-30mm), and only 47 false positives from a total of 1021 annotations on nodules of all sizes (96% sensitivity for nodules$>$4mm). These results show that crowdsourcing can be a robust and scalable modality to aid radiologists in screening for lung cancer, directly or in combination with computer-aided detection (CAD) algorithms. For CAD algorithms, the presented workflow can provide highly accurate training data to overcome the high false-positive rate (per scan) problem. We also provide, for the first time, analysis on nodule size and position which can help improve CAD algorithms.

3.8CVSep 5, 2016

Depth Reconstruction and Computer-Aided Polyp Detection in Optical Colonoscopy Video Frames

Saad Nadeem, Arie Kaufman

We present a computer-aided detection algorithm for polyps in optical colonoscopy images. Polyps are the precursors to colon cancer. In the US alone, more than 14 million optical colonoscopies are performed every year, mostly to screen for polyps. Optical colonoscopy has been shown to have an approximately 25% polyp miss rate due to the convoluted folds and bends present in the colon. In this work, we present an automatic detection algorithm to detect these polyps in the optical colonoscopy images. We use a machine learning algorithm to infer a depth map for a given optical colonoscopy image and then use a detailed pre-built polyp profile to detect and delineate the boundaries of polyps in this given image. We have achieved the best recall of 84.0% and the best specificity value of 83.4%.

3.1CVJul 6, 2013

Anatomical Feature-guided Volumeric Registration of Multimodal Prostate MRI

Xin Zhao, Arie Kaufman

Radiological imaging of prostate is becoming more popular among researchers and clinicians in searching for diseases, primarily cancer. Scans might be acquired at different times, with patient movement between scans, or with different equipment, resulting in multiple datasets that need to be registered. For this issue, we introduce a registration method using anatomical feature-guided mutual information. Prostate scans of the same patient taken in three different orientations are first aligned for the accurate detection of anatomical features in 3D. Then, our pipeline allows for multiple modalities registration through the use of anatomical features, such as the interior urethra of prostate and gland utricle, in a bijective way. The novelty of this approach is the application of anatomical features as the pre-specified corresponding landmarks for prostate registration. We evaluate the registration results through both artificial and clinical datasets. Registration accuracy is evaluated by performing statistical analysis of local intensity differences or spatial differences of anatomical landmarks between various MR datasets. Evaluation results demonstrate that our method statistics-significantly improves the quality of registration. Although this strategy is tested for MRI-guided brachytherapy, the preliminary results from these experiments suggest that it can be also applied to other settings such as transrectal ultrasound-guided or CT-guided therapy, where the integration of preoperative MRI may have a significant impact upon treatment planning and guidance.