CVJul 5, 2023
The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CTNicholas Heller, Fabian Isensee, Dasha Trofimova et al.
This paper presents the challenge report for the 2021 Kidney and Kidney Tumor Segmentation Challenge (KiTS21) held in conjunction with the 2021 international conference on Medical Image Computing and Computer Assisted Interventions (MICCAI). KiTS21 is a sequel to its first edition in 2019, and it features a variety of innovations in how the challenge was designed, in addition to a larger dataset. A novel annotation method was used to collect three separate annotations for each region of interest, and these annotations were performed in a fully transparent setting using a web-based annotation tool. Further, the KiTS21 test set was collected from an outside institution, challenging participants to develop methods that generalize well to new populations. Nonetheless, the top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance. An in-depth meta-analysis is presented describing which methods were used and how they faired on the leaderboard, as well as the characteristics of which cases generally saw good performance, and which did not. Overall KiTS21 facilitated a significant advancement in the state of the art in kidney tumor segmentation, and provides useful insights that are applicable to the field of semantic segmentation as a whole.
ROSep 29, 2023
3D Reconstruction in Noisy Agricultural Environments: A Bayesian Optimization Perspective for View PlanningAthanasios Bacharis, Konstantinos D. Polyzos, Henry J. Nelson et al.
3D reconstruction is a fundamental task in robotics that gained attention due to its major impact in a wide variety of practical settings, including agriculture, underwater, and urban environments. This task can be carried out via view planning (VP), which aims to optimally place a certain number of cameras in positions that maximize the visual information, improving the resulting 3D reconstruction. Nonetheless, in most real-world settings, existing environmental noise can significantly affect the performance of 3D reconstruction. To that end, this work advocates a novel geometric-based reconstruction quality function for VP, that accounts for the existing noise of the environment, without requiring its closed-form expression. With no analytic expression of the objective function, this work puts forth an adaptive Bayesian optimization algorithm for accurate 3D reconstruction in the presence of noise. Numerical tests on noisy agricultural environments showcase the merits of the proposed approach for 3D reconstruction with even a small number of available cameras.
IVDec 20, 2024Code
Efficient MedSAMs: Segment Anything in Medical Images on LaptopJun Ma, Feifei Li, Sumin Kim et al.
Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spanning nine common imaging modalities from over 20 different institutions. The top teams developed lightweight segmentation foundation models and implemented an efficient inference pipeline that substantially reduced computational requirements while maintaining state-of-the-art segmentation accuracy. Moreover, the post-challenge phase advanced the algorithms through the design of performance booster and reproducibility tasks, resulting in improved algorithms and validated reproducibility of the winning solution. Furthermore, the best-performing algorithms have been incorporated into the open-source software with a user-friendly interface to facilitate clinical adoption. The data and code are publicly available to foster the further development of medical image segmentation foundation models and pave the way for impactful real-world applications.
CVJun 12, 2018Code
Convolutional Neural Networks for Aircraft Noise MonitoringNicholas Heller, Derek Anderson, Matt Baker et al.
Air travel is one of the fastest growing modes of transportation, however, the effects of aircraft noise on populations surrounding airports is hindering its growth. In an effort to study and ultimately mitigate the impact that this noise has, many airports continuously monitor the aircraft noise in their surrounding communities. Noise monitoring and analysis is complicated by the fact that aircraft are not the only source of noise. In this work, we show that a Convolutional Neural Network is well-suited for the task of identifying noise events which are not caused by aircraft. Our system achieves an accuracy of 0.970 when trained on 900 manually labeled noise events. Our training data and a TensorFlow implementation of our model are available at https://github.com/neheller/aircraftnoise.
ROFeb 4, 2025
From Human Hands to Robotic Limbs: A Study in Motor Skill Embodiment for TelemanipulationHaoyi Shi, Mingxi Su, Ted Morris et al.
This paper presents a teleoperation system for controlling a redundant degree of freedom robot manipulator using human arm gestures. We propose a GRU-based Variational Autoencoder to learn a latent representation of the manipulator's configuration space, capturing its complex joint kinematics. A fully connected neural network maps human arm configurations into this latent space, allowing the system to mimic and generate corresponding manipulator trajectories in real time through the VAE decoder. The proposed method shows promising results in teleoperating the manipulator, enabling the generation of novel manipulator configurations from human features that were not present during training.
ROSep 28, 2025
BOSfM: A View Planning Framework for Optimal 3D Reconstruction of Agricultural ScenesAthanasios Bacharis, Konstantinos D. Polyzos, Georgios B. Giannakis et al.
Active vision (AV) has been in the spotlight of robotics research due to its emergence in numerous applications including agricultural tasks such as precision crop monitoring and autonomous harvesting to list a few. A major AV problem that gained popularity is the 3D reconstruction of targeted environments using 2D images from diverse viewpoints. While collecting and processing a large number of arbitrarily captured 2D images can be arduous in many practical scenarios, a more efficient solution involves optimizing the placement of available cameras in 3D space to capture fewer, yet more informative, images that provide sufficient visual information for effective reconstruction of the environment of interest. This process termed as view planning (VP), can be markedly challenged (i) by noise emerging in the location of the cameras and/or in the extracted images, and (ii) by the need to generalize well in other unknown similar agricultural environments without need for re-optimizing or re-training. To cope with these challenges, the present work presents a novel VP framework that considers a reconstruction quality-based optimization formulation that relies on the notion of `structure-from-motion' to reconstruct the 3D structure of the sought environment from the selected 2D images. With no analytic expression of the optimization function and with costly function evaluations, a Bayesian optimization approach is proposed to efficiently carry out the VP process using only a few function evaluations, while accounting for different noise cases. Numerical tests on both simulated and real agricultural settings signify the benefits of the advocated VP approach in efficiently estimating the optimal camera placement to accurately reconstruct 3D environments of interest, and generalize well on similar unknown environments.
CVJun 29, 2024
AI Age Discrepancy: A Novel Parameter for Frailty Assessment in Kidney Tumor PatientsRikhil Seshadri, Jayant Siva, Angelica Bartholomew et al.
Kidney cancer is a global health concern, and accurate assessment of patient frailty is crucial for optimizing surgical outcomes. This paper introduces AI Age Discrepancy, a novel metric derived from machine learning analysis of preoperative abdominal CT scans, as a potential indicator of frailty and postoperative risk in kidney cancer patients. This retrospective study of 599 patients from the 2023 Kidney Tumor Segmentation (KiTS) challenge dataset found that a higher AI Age Discrepancy is significantly associated with longer hospital stays and lower overall survival rates, independent of established factors. This suggests that AI Age Discrepancy may provide valuable insights into patient frailty and could thus inform clinical decision-making in kidney cancer treatment.
CVJul 22, 2021
Pre-Clustering Point Clouds of Crop Fields Using Scalable MethodsHenry J. Nelson, Nikolaos Papanikolopoulos
In order to apply the recent successes of machine learning and automated plant phenotyping on a large scale using agricultural robotics, efficient and general algorithms must be designed to intelligently split crop fields into small, yet actionable, portions that can then be processed by more complex algorithms. In this paper, we notice a similarity between the current state-of-the-art for separating corn plants and a commonly used density-based clustering algorithm, Quickshift. Exploiting this similarity we propose a number of novel, application-specific algorithms with the goal of producing a general and scalable field segmentation algorithm. The novel algorithms proposed in this work are shown to produce quantitatively better results than the current state-of-the-art while being less sensitive to input parameters and maintaining the same algorithmic time complexity. When incorporated into field-scale phenotyping systems, the proposed algorithms should work as a drop-in replacement that can greatly improve the accuracy of results while ensuring that performance and scalability remain undiminished.
LGApr 13, 2021
Learning Log-Determinant Divergences for Positive Definite MatricesAnoop Cherian, Panagiotis Stanitsas, Jue Wang et al.
Representations in the form of Symmetric Positive Definite (SPD) matrices have been popularized in a variety of visual learning applications due to their demonstrated ability to capture rich second-order statistics of visual data. There exist several similarity measures for comparing SPD matrices with documented benefits. However, selecting an appropriate measure for a given problem remains a challenge and in most cases, is the result of a trial-and-error process. In this paper, we propose to learn similarity measures in a data-driven manner. To this end, we capitalize on the αβ-log-det divergence, which is a meta-divergence parametrized by scalars αand β, subsuming a wide family of popular information divergences on SPD matrices for distinct and discrete values of these parameters. Our key idea is to cast these parameters in a continuum and learn them from data. We systematically extend this idea to learn vector-valued parameters, thereby increasing the expressiveness of the underlying non-linear measure. We conjoin the divergence learning problem with several standard tasks in machine learning, including supervised discriminative dictionary learning and unsupervised SPD matrix clustering. We present Riemannian gradient descent schemes for optimizing our formulations efficiently, and show the usefulness of our method on eight standard computer vision tasks.
IVDec 2, 2019
The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 ChallengeNicholas Heller, Fabian Isensee, Klaus H. Maier-Hein et al.
There is a large body of literature linking anatomic and geometric characteristics of kidney tumors to perioperative and oncologic outcomes. Semantic segmentation of these tumors and their host kidneys is a promising tool for quantitatively characterizing these lesions, but its adoption is limited due to the manual effort required to produce high-quality 3D segmentations of these structures. Recently, methods based on deep learning have shown excellent results in automatic 3D segmentation, but they require large datasets for training, and there remains little consensus on which methods perform best. The 2019 Kidney and Kidney Tumor Segmentation challenge (KiTS19) was a competition held in conjunction with the 2019 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) which sought to address these issues and stimulate progress on this automatic segmentation problem. A training set of 210 cross sectional CT images with kidney tumors was publicly released with corresponding semantic segmentation masks. 106 teams from five continents used this data to develop automated systems to predict the true segmentation masks on a test set of 90 CT images for which the corresponding ground truth segmentations were kept private. These predictions were scored and ranked according to their average So rensen-Dice coefficient between the kidney and tumor across all 90 cases. The winning team achieved a Dice of 0.974 for kidney and 0.851 for tumor, approaching the inter-annotator performance on kidney (0.983) but falling short on tumor (0.923). This challenge has now entered an "open leaderboard" phase where it serves as a challenging benchmark in 3D semantic segmentation.
RONov 22, 2019
Design and Experiments with a Robot-Driven Underwater Holographic Microscope for Low-Cost In Situ Particle MeasurementsKevin Mallery, Dario Canelon, Jiarong Hong et al.
Microscopic analysis of micro particles in situ in diverse water environments is necessary for monitoring water quality and localizing contamination sources. Conventional sensors such as optical microscopes and fluorometers often require complex sample preparation, are restricted to small sample volumes, and are unable to simultaneously capture all pertinent details of a sample such as particle size, shape, concentration, and three dimensional motion. In this article we propose a novel and cost-effective robotic system for mobile microscopic analysis of particles in situ at various depths which are fully controlled by the robot system itself. A miniature underwater digital in-line holographic microscope (DIHM) performs high resolution imaging of microparticles (e.g., algae cells, plastic debris, sediments) while movement allows measurement of particle distributions covering a large area of water. The main contribution of this work is the creation of a low-cost, comprehensive, and small underwater robotic holographic microscope that can assist in a variety of tasks in environmental monitoring and overall assessment of water quality such as contaminant detection and localization. The resulting system provides some unique capabilities such as expanded and systematic coverage of large bodies of water at a low cost. Several challenges such as the trade-off between image quality and cost are addressed to satisfy the aforementioned goals.
LGAug 12, 2019
The Role of Publicly Available Data in MICCAI Papers from 2014 to 2018Nicholas Heller, Jack Rickman, Christopher Weight et al.
Widely-used public benchmarks are of huge importance to computer vision and machine learning research, especially with the computational resources required to reproduce state of the art results quickly becoming untenable. In medical image computing, the wide variety of image modalities and problem formulations yields a huge task-space for benchmarks to cover, and thus the widespread adoption of standard benchmarks has been slow, and barriers to releasing medical data exacerbate this issue. In this paper, we examine the role that publicly available data has played in MICCAI papers from the past five years. We find that more than half of these papers are based on private data alone, although this proportion seems to be decreasing over time. Additionally, we observed that after controlling for open access publication and the release of code, papers based on public data were cited over 60% more per year than their private-data counterparts. Further, we found that more than 20% of papers using public data did not provide a citation to the dataset or associated manuscript, highlighting the "second-rate" status that data contributions often take compared to theoretical ones. We conclude by making recommendations for MICCAI policies which could help to better incentivise data sharing and move the field toward more efficient and reproducible science.
CVJul 9, 2019
Estimating Pedestrian Moving State Based on Single 2D Body PoseZixing Wang, Nikolaos Papanikolopoulos
The Crossing or Not-Crossing (C/NC) problem is important to autonomous vehicles (AVs) for safe vehicle/pedestrian interactions. However, this problem setup often ignores pedestrians walking along the direction of the vehicles' movement (LONG). To enhance the AVs' awareness of pedestrians behavior, we make the first step towards extending the C/NC to the C/NC/LONG problem and recognize them based on single body pose. In contrast, previous C/NC state classifiers depend on multiple poses or contextual information. Our proposed shallow neural network classifier aims to recognize these three states swiftly. We tested it on the JAAD dataset and reported an average 81.23% accuracy. Furthermore, this model can be integrated with different sensors and algorithms that provide 2D pedestrian body pose so that it is able to function across multiple light and weather conditions.
QMMar 31, 2019
The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context, CT Semantic Segmentations, and Surgical OutcomesNicholas Heller, Niranjan Sathianathen, Arveen Kalapara et al.
The morphometry of a kidney tumor revealed by contrast-enhanced Computed Tomography (CT) imaging is an important factor in clinical decision making surrounding the lesion's diagnosis and treatment. Quantitative study of the relationship between kidney tumor morphology and clinical outcomes is difficult due to data scarcity and the laborious nature of manually quantifying imaging predictors. Automatic semantic segmentation of kidneys and kidney tumors is a promising tool towards automatically quantifying a wide array of morphometric features, but no sizeable annotated dataset is currently available to train models for this task. We present the KiTS19 challenge dataset: A collection of multi-phase CT imaging, segmentation masks, and comprehensive clinical outcomes for 300 patients who underwent nephrectomy for kidney tumors at our center between 2010 and 2018. 210 (70%) of these patients were selected at random as the training set for the 2019 MICCAI KiTS Kidney Tumor Segmentation Challenge and have been released publicly. With the presence of clinical context and surgical outcomes, this data can serve not only for benchmarking semantic segmentation models, but also for developing and studying biomarkers which make use of the imaging and semantic segmentation masks.
CVJun 12, 2018
Imperfect Segmentation Labels: How Much Do They Matter?Nicholas Heller, Joshua Dean, Nikolaos Papanikolopoulos
Labeled datasets for semantic segmentation are imperfect, especially in medical imaging where borders are often subtle or ill-defined. Little work has been done to analyze the effect that label errors have on the performance of segmentation methodologies. Here we present a large-scale study of model performance in the presence of varying types and degrees of error in training data. We trained U-Net, SegNet, and FCN32 several times for liver segmentation with 10 different modes of ground-truth perturbation. Our results show that for each architecture, performance steadily declines with boundary-localized errors, however, U-Net was significantly more robust to jagged boundary errors than the other architectures. We also found that each architecture was very robust to non-boundary-localized errors, suggesting that boundary-localized errors are fundamentally different and more challenging problem than random label errors in a classification setting.
CVAug 5, 2017
Learning Discriminative Alpha-Beta-divergence for Positive Definite Matrices (Extended Version)Anoop Cherian, Panagiotis Stanitsas, Mehrtash Harandi et al.
Symmetric positive definite (SPD) matrices are useful for capturing second-order statistics of visual data. To compare two SPD matrices, several measures are available, such as the affine-invariant Riemannian metric, Jeffreys divergence, Jensen-Bregman logdet divergence, etc.; however, their behaviors may be application dependent, raising the need of manual selection to achieve the best possible performance. Further and as a result of their overwhelming complexity for large-scale problems, computing pairwise similarities by clever embedding of SPD matrices is often preferred to direct use of the aforementioned measures. In this paper, we propose a discriminative metric learning framework, Information Divergence and Dictionary Learning (IDDL), that not only learns application specific measures on SPD matrices automatically, but also embeds them as vectors using a learned dictionary. To learn the similarity measures (which could potentially be distinct for every dictionary atom), we use the recently introduced alpha-beta-logdet divergence, which is known to unify the measures listed above. We propose a novel IDDL objective, that learns the parameters of the divergence and the dictionary atoms jointly in a discriminative setup and is solved efficiently using Riemannian optimization. We showcase extensive experiments on eight computer vision datasets, demonstrating state-of-the-art performances.
CVJun 23, 2015
Autonomous 3D Reconstruction Using a MAVAlexander Popov, Dimitrios Zermas, Nikolaos Papanikolopoulos
An approach is proposed for high resolution 3D reconstruction of an object using a Micro Air Vehicle (MAV). A system is described which autonomously captures images and performs a dense 3D reconstruction via structure from motion with no prior knowledge of the environment. Only the MAVs own sensors, the front facing camera and the Inertial Measurement Unit (IMU) are utilized. Precision agriculture is considered as an example application for the system.
CVMay 29, 2013
Video Human Segmentation using Fuzzy Object Models and its Application to Body Pose Estimation of Toddlers for Behavior StudiesThiago V. Spina, Mariano Tepper, Amy Esler et al.
Video object segmentation is a challenging problem due to the presence of deformable, connected, and articulated objects, intra- and inter-object occlusions, object motion, and poor lighting. Some of these challenges call for object models that can locate a desired object and separate it from its surrounding background, even when both share similar colors and textures. In this work, we extend a fuzzy object model, named cloud system model (CSM), to handle video segmentation, and evaluate it for body pose estimation of toddlers at risk of autism. CSM has been successfully used to model the parts of the brain (cerebrum, left and right brain hemispheres, and cerebellum) in order to automatically locate and separate them from each other, the connected brain stem, and the background in 3D MR-images. In our case, the objects are articulated parts (2D projections) of the human body, which can deform, cause self-occlusions, and move along the video. The proposed CSM extension handles articulation by connecting the individual clouds, body parts, of the system using a 2D stickman model. The stickman representation naturally allows us to extract 2D body pose measures of arm asymmetry patterns during unsupported gait of toddlers, a possible behavioral marker of autism. The results show that our method can provide insightful knowledge to assist the specialist's observations during real in-clinic assessments.
CVOct 25, 2012
Computer vision tools for the non-invasive assessment of autism-related behavioral markersJordan Hashemi, Thiago Vallin Spina, Mariano Tepper et al.
The early detection of developmental disorders is key to child outcome, allowing interventions to be initiated that promote development and improve prognosis. Research on autism spectrum disorder (ASD) suggests behavioral markers can be observed late in the first year of life. Many of these studies involved extensive frame-by-frame video observation and analysis of a child's natural behavior. Although non-intrusive, these methods are extremely time-intensive and require a high level of observer training; thus, they are impractical for clinical and large population research purposes. Diagnostic measures for ASD are available for infants but are only accurate when used by specialists experienced in early diagnosis. This work is a first milestone in a long-term multidisciplinary project that aims at helping clinicians and general practitioners accomplish this early detection/measurement task automatically. We focus on providing computer vision tools to measure and identify ASD behavioral markers based on components of the Autism Observation Scale for Infants (AOSI). In particular, we develop algorithms to measure three critical AOSI activities that assess visual attention. We augment these AOSI activities with an additional test that analyzes asymmetrical patterns in unsupported gait. The first set of algorithms involves assessing head motion by tracking facial features, while the gait analysis relies on joint foreground segmentation and 2D body pose estimation in video. We show results that provide insightful knowledge to augment the clinician's behavioral observations obtained from real in-clinic assessments.