14.5CVDec 16, 2022
Biomedical image analysis competitions: The state of current participation practiceMatthias Eisenmann, Annika Reinke, Vivienn Weru et al. · utoronto
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
Segment Anything in Medical ImagesJun Ma, Yuting He, Feifei Li et al.
Medical image segmentation is a critical component in clinical practice, facilitating accurate diagnosis, treatment planning, and disease monitoring. However, existing methods, often tailored to specific modalities or disease types, lack generalizability across the diverse spectrum of medical image segmentation tasks. Here we present MedSAM, a foundation model designed for bridging this gap by enabling universal medical image segmentation. The model is developed on a large-scale medical image dataset with 1,570,263 image-mask pairs, covering 10 imaging modalities and over 30 cancer types. We conduct a comprehensive evaluation on 86 internal validation tasks and 60 external validation tasks, demonstrating better accuracy and robustness than modality-wise specialist models. By delivering accurate and efficient segmentation across a wide spectrum of tasks, MedSAM holds significant potential to expedite the evolution of diagnostic tools and the personalization of treatment plans.
27.2IVAug 10, 2023
The Multi-modality Cell Segmentation Challenge: Towards Universal SolutionsJun Ma, Ronald Xie, Shamini Ayyadhury et al.
Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging.
Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 ChallengeJun Ma, Yao Zhang, Song Gu et al.
Quantitative organ assessment is an essential step in automated abdominal disease diagnosis and treatment planning. Artificial intelligence (AI) has shown great potential to automatize this process. However, most existing AI algorithms rely on many expert annotations and lack a comprehensive evaluation of accuracy and efficiency in real-world multinational settings. To overcome these limitations, we organized the FLARE 2022 Challenge, the largest abdominal organ analysis challenge to date, to benchmark fast, low-resource, accurate, annotation-efficient, and generalized AI algorithms. We constructed an intercontinental and multinational dataset from more than 50 medical groups, including Computed Tomography (CT) scans with different races, diseases, phases, and manufacturers. We independently validated that a set of AI algorithms achieved a median Dice Similarity Coefficient (DSC) of 90.0\% by using 50 labeled scans and 2000 unlabeled scans, which can significantly reduce annotation requirements. The best-performing algorithms successfully generalized to holdout external validation sets, achieving a median DSC of 89.5\%, 90.9\%, and 88.3\% on North American, European, and Asian cohorts, respectively. They also enabled automatic extraction of key organ biology features, which was labor-intensive with traditional manual measurements. This opens the potential to use unlabeled data to boost performance and alleviate annotation shortages for modern AI models.
55.2IVJan 9, 2024
U-Mamba: Enhancing Long-range Dependency for Biomedical Image SegmentationJun Ma, Feifei Li, Bo Wang
Convolutional Neural Networks (CNNs) and Transformers have been the most popular architectures for biomedical image segmentation, but both of them have limited ability to handle long-range dependencies because of inherent locality or computational complexity. To address this challenge, we introduce U-Mamba, a general-purpose network for biomedical image segmentation. Inspired by the State Space Sequence Models (SSMs), a new family of deep sequence models known for their strong capability in handling long sequences, we design a hybrid CNN-SSM block that integrates the local feature extraction power of convolutional layers with the abilities of SSMs for capturing the long-range dependency. Moreover, U-Mamba enjoys a self-configuring mechanism, allowing it to automatically adapt to various datasets without manual intervention. We conduct extensive experiments on four diverse tasks, including the 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images. The results reveal that U-Mamba outperforms state-of-the-art CNN-based and Transformer-based segmentation networks across all tasks. This opens new avenues for efficient long-range dependency modeling in biomedical image analysis. The code, models, and data are publicly available at https://wanglab.ai/u-mamba.html.
3.7CVSep 30, 2024
Annotation-Free Curb Detection Leveraging Altitude Difference ImageFulong Ma, Peng Hou, Yuxuan Liu et al.
Road curbs are considered as one of the crucial and ubiquitous traffic features, which are essential for ensuring the safety of autonomous vehicles. Current methods for detecting curbs primarily rely on camera imagery or LiDAR point clouds. Image-based methods are vulnerable to fluctuations in lighting conditions and exhibit poor robustness, while methods based on point clouds circumvent the issues associated with lighting variations. However, it is the typical case that significant processing delays are encountered due to the voluminous amount of 3D points contained in each frame of the point cloud data. Furthermore, the inherently unstructured characteristics of point clouds poses challenges for integrating the latest deep learning advancements into point cloud data applications. To address these issues, this work proposes an annotation-free curb detection method leveraging Altitude Difference Image (ADI), which effectively mitigates the aforementioned challenges. Given that methods based on deep learning generally demand extensive, manually annotated datasets, which are both expensive and labor-intensive to create, we present an Automatic Curb Annotator (ACA) module. This module utilizes a deterministic curb detection algorithm to automatically generate a vast quantity of training data. Consequently, it facilitates the training of the curb detection model without necessitating any manual annotation of data. Finally, by incorporating a post-processing module, we manage to achieve state-of-the-art results on the KITTI 3D curb dataset with considerably reduced processing delays compared to existing methods, which underscores the effectiveness of our approach in curb detection tasks.
Rethinking the Residual Distribution of Locate-then-Editing Methods in Model EditingXiaopeng Li, Shanwen Wang, Shasha Li et al.
Model editing enables targeted updates to the knowledge of large language models (LLMs) with minimal retraining. Among existing approaches, locate-then-edit methods constitute a prominent paradigm: they first identify critical layers, then compute residuals at the final critical layer based on the target edit, and finally apply least-squares-based multi-layer updates via $\textbf{residual distribution}$. While empirically effective, we identify a counterintuitive failure mode: residual distribution, a core mechanism in these methods, introduces weight shift errors that undermine editing precision. Through theoretical and empirical analysis, we show that such errors increase with the distribution distance, batch size, and edit sequence length, ultimately leading to inaccurate or suboptimal edits. To address this, we propose the $\textbf{B}$oundary $\textbf{L}$ayer $\textbf{U}$pdat$\textbf{E (BLUE)}$ strategy to enhance locate-then-edit methods. Sequential batch editing experiments on three LLMs and two datasets demonstrate that BLUE not only delivers an average performance improvement of 35.59\%, significantly advancing the state of the art in model editing, but also enhances the preservation of LLMs' general capabilities. Our code is available at https://github.com/xpq-tech/BLUE.
QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking ResultsRaghav Mehta, Angelos Filos, Ujjwal Baid et al.
Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties could enable clinical review of the most uncertain regions, thereby building trust and paving the way toward clinical translation. Several uncertainty estimation methods have recently been introduced for DL medical image segmentation tasks. Developing scores to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a score developed during the BraTS 2019 and BraTS 2020 task on uncertainty quantification (QU-BraTS) and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This score (1) rewards uncertainty estimates that produce high confidence in correct assertions and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentage of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility, our evaluation code is made publicly available at: https://github.com/RagMeh11/QU-BraTS.
Cutting-edge 3D Medical Image Segmentation Methods in 2020: Are Happy Families All Alike?Jun Ma
Segmentation is one of the most important and popular tasks in medical image analysis, which plays a critical role in disease diagnosis, surgical planning, and prognosis evaluation. During the past five years, on the one hand, thousands of medical image segmentation methods have been proposed for various organs and lesions in different medical images, which become more and more challenging to fairly compare different methods. On the other hand, international segmentation challenges can provide a transparent platform to fairly evaluate and compare different methods. In this paper, we present a comprehensive review of the top methods in ten 3D medical image segmentation challenges during 2020, covering a variety of tasks and datasets. We also identify the "happy-families" practices in the cutting-edge segmentation methods, which are useful for developing powerful segmentation approaches. Finally, we discuss open research problems that should be addressed in the future. We also maintain a list of cutting-edge segmentation methods at \url{https://github.com/JunMa11/SOTA-MedSeg}.
Loss Ensembles for Extremely Imbalanced SegmentationJun Ma
This short paper briefly presents our methodology details of automatic intracranial aneurysms segmentation from brain MR scans. We use ensembles of multiple models trained from different loss functions. Our method ranked first place in the ADAM challenge segmentation task. The code and trained models are publicly available at https://github.com/JunMa11/ADAM2020.
Exploring Large Context for Cerebral Aneurysm SegmentationJun Ma, Ziwei Nie
Automated segmentation of aneurysms from 3D CT is important for the diagnosis, monitoring, and treatment planning of the cerebral aneurysm disease. This short paper briefly presents the main technique details of the aneurysm segmentation method in the MICCAI 2020 CADA challenge. The main contribution is that we configure the 3D U-Net with a large patch size, which can obtain the large context. Our method ranked second on the MICCAI 2020 CADA testing dataset with an average Jaccard of 0.7593. Our code and trained models are publicly available at \url{https://github.com/JunMa11/CADA2020}.
Histogram Matching Augmentation for Domain Adaptation with Application to Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Image SegmentationJun Ma
Convolutional Neural Networks (CNNs) have achieved high accuracy for cardiac structure segmentation if training cases and testing cases are from the same distribution. However, the performance would be degraded if the testing cases are from a distinct domain (e.g., new MRI scanners, clinical centers). In this paper, we propose a histogram matching (HM) data augmentation method to eliminate the domain gap. Specifically, our method generates new training cases by using HM to transfer the intensity distribution of testing cases to existing training cases. The proposed method is quite simple and can be used in a plug-and-play way in many segmentation tasks. The method is evaluated on MICCAI 2020 M\&Ms challenge, and achieves average Dice scores of 0.9051, 0.8405, and 0.8749, and Hausdorff Distances of 9.996, 12.49, and 12.68 for the left ventricular, myocardium, and right ventricular, respectively. Our results rank the third place in MICCAI 2020 M\&Ms challenge. The code and trained models are publicly available at \url{https://github.com/JunMa11/HM_DataAug}.
AbdomenCT-1K: Is Abdominal Organ Segmentation A Solved Problem?Jun Ma, Yao Zhang, Song Gu et al.
With the unprecedented developments in deep learning, automatic segmentation of main abdominal organs seems to be a solved problem as state-of-the-art (SOTA) methods have achieved comparable results with inter-rater variability on many benchmark datasets. However, most of the existing abdominal datasets only contain single-center, single-phase, single-vendor, or single-disease cases, and it is unclear whether the excellent performance can generalize on diverse datasets. This paper presents a large and diverse abdominal CT organ segmentation dataset, termed AbdomenCT-1K, with more than 1000 (1K) CT scans from 12 medical centers, including multi-phase, multi-vendor, and multi-disease cases. Furthermore, we conduct a large-scale study for liver, kidney, spleen, and pancreas segmentation and reveal the unsolved segmentation problems of the SOTA methods, such as the limited generalization ability on distinct medical centers, phases, and unseen diseases. To advance the unsolved problems, we further build four organ segmentation benchmarks for fully supervised, semi-supervised, weakly supervised, and continual learning, which are currently challenging and active research topics. Accordingly, we develop a simple and effective method for each benchmark, which can be used as out-of-the-box methods and strong baselines. We believe the AbdomenCT-1K dataset will promote future in-depth research towards clinical applicable abdominal organ segmentation methods. The datasets, codes, and trained models are publicly available at https://github.com/JunMa11/AbdomenCT-1K.
Segmentation Loss OdysseyJun Ma
Loss functions are one of the crucial ingredients in deep learning-based medical image segmentation methods. Many loss functions have been proposed in existing literature, but are studied separately or only investigated with few other losses. In this paper, we present a systematic taxonomy to sort existing loss functions into four meaningful categories. This helps to reveal links and fundamental similarities between them. Moreover, we explore the relationship between the traditional region-based and the more recent boundary-based loss functions. The PyTorch implementations of these loss functions are publicly available at \url{https://github.com/JunMa11/SegLoss}.
20.6IVMar 19, 2024
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation ChallengeHongwei Bran Li, Fernando Navarro, Ivan Ezhov et al.
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
22.8IVJan 10, 2022
MyoPS: A Benchmark of Myocardial Pathology Segmentation Combining Three-Sequence Cardiac Magnetic Resonance ImagesLei Li, Fuping Wu, Sihan Wang et al.
Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which was first proposed in the MyoPS challenge, in conjunction with MICCAI 2020. The challenge provided 45 paired and pre-aligned CMR images, allowing algorithms to combine the complementary information from the three CMR sequences for pathology segmentation. In this article, we provide details of the challenge, survey the works from fifteen participants and interpret their methods according to five aspects, i.e., preprocessing, data augmentation, learning strategy, model architecture and post-processing. In addition, we analyze the results with respect to different factors, in order to examine the key obstacles and explore potential of solutions, as well as to provide a benchmark for future research. We conclude that while promising results have been reported, the research is still in the early stage, and more in-depth exploration is needed before a successful application to the clinics. Note that MyoPS data and evaluation tool continue to be publicly available upon registration via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20/).
20.1IVAug 9, 2021
Deep Learning methods for automatic evaluation of delayed enhancement-MRI. The results of the EMIDEC challengeAlain Lalande, Zhihao Chen, Thibaut Pommier et al.
A key factor for assessing the state of the heart after myocardial infarction (MI) is to measure whether the myocardium segment is viable after reperfusion or revascularization therapy. Delayed enhancement-MRI or DE-MRI, which is performed several minutes after injection of the contrast agent, provides high contrast between viable and nonviable myocardium and is therefore a method of choice to evaluate the extent of MI. To automatically assess myocardial status, the results of the EMIDEC challenge that focused on this task are presented in this paper. The challenge's main objectives were twofold. First, to evaluate if deep learning methods can distinguish between normal and pathological cases. Second, to automatically calculate the extent of myocardial infarction. The publicly available database consists of 150 exams divided into 50 cases with normal MRI after injection of a contrast agent and 100 cases with myocardial infarction (and then with a hyperenhanced area on DE-MRI), whatever their inclusion in the cardiac emergency department. Along with MRI, clinical characteristics are also provided. The obtained results issued from several works show that the automatic classification of an exam is a reachable task (the best method providing an accuracy of 0.92), and the automatic segmentation of the myocardium is possible. However, the segmentation of the diseased area needs to be improved, mainly due to the small size of these areas and the lack of contrast with the surrounding structures.
6.5IVDec 29, 2020
Cascaded Framework for Automatic Evaluation of Myocardial Infarction from Delayed-Enhancement Cardiac MRIJun Ma
Automatic evaluation of myocardium and pathology plays an important role in the quantitative analysis of patients suffering from myocardial infarction. In this paper, we present a cascaded convolutional neural network framework for myocardial infarction segmentation and classification in delayed-enhancement cardiac MRI. Specifically, we first use a 2D U-Net to segment the whole heart, including the left ventricle and the myocardium. Then, we crop the whole heart as a region of interest (ROI). Finally, a new 2D U-Net is used to segment the infraction and no-reflow areas in the whole heart ROI. The segmentation method can be applied to the classification task where the segmentation results with the infraction or no-reflow areas are classified as pathological cases. Our method took second place in the MICCAI 2020 EMIDEC segmentation task with Dice scores of 86.28%, 62.24%, and 77.76% for myocardium, infraction, and no-reflow areas, respectively, and first place in the classification task with an accuracy of 92%.
7.6IVDec 28, 2020
Combining CNN and Hybrid Active Contours for Head and Neck Tumor Segmentation in CT and PET imagesJun Ma, Xiaoping Yang
Automatic segmentation of head and neck tumors plays an important role in radiomics analysis. In this short paper, we propose an automatic segmentation method for head and neck tumors from PET and CT images based on the combination of convolutional neural networks (CNNs) and hybrid active contours. Specifically, we first introduce a multi-channel 3D U-Net to segment the tumor with the concatenated PET and CT images. Then, we estimate the segmentation uncertainty by model ensembles and define a segmentation quality score to select the cases with high uncertainties. Finally, we develop a hybrid active contour model to refine the high uncertainty cases. Our method ranked second place in the MICCAI 2020 HECKTOR challenge with average Dice Similarity Coefficient, precision, and recall of 0.752, 0.838, and 0.717, respectively.
17.4IVJul 4, 2020
Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 ChallengeYue Sun, Kun Gao, Zhengwang Wu et al.
To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site issue, that is, the models trained on a dataset from one site may not be applicable to the datasets acquired from other sites with different imaging protocols/scanners. To promote methodological development in the community, iSeg-2019 challenge (http://iseg2019.web.unc.edu) provides a set of 6-month infant subjects from multiple sites with different protocols/scanners for the participating methods. Training/validation subjects are from UNC (MAP) and testing subjects are from UNC/UMN (BCP), Stanford University, and Emory University. By the time of writing, there are 30 automatic segmentation methods participating in iSeg-2019. We review the 8 top-ranked teams by detailing their pipelines/implementations, presenting experimental results and evaluating performance in terms of the whole brain, regions of interest, and gyral landmark curves. We also discuss their limitations and possible future directions for the multi-site issue. We hope that the multi-site dataset in iSeg-2019 and this review article will attract more researchers on the multi-site issue.
5.8CVJul 1, 2020
A Characteristic Function-based Algorithm for Geodesic Active ContoursJun Ma, Dong Wang, Xiao-Ping Wang et al.
Active contour models have been widely used in image segmentation, and the level set method (LSM) is the most popular approach for solving the models, via implicitly representing the contour by a level set function. However, the LSM suffers from high computational burden and numerical instability, requiring additional regularization terms or re-initialization techniques. In this paper, we use characteristic functions to implicitly represent the contours, propose a new representation to the geodesic active contours and derive an efficient algorithm termed as the iterative convolution-thresholding method (ICTM). Compared to the LSM, the ICTM is simpler and much more efficient. In addition, the ICTM enjoys most desired features of the level set-based methods. Extensive experiments, on 2D synthetic, 2D ultrasound, 3D CT, and 3D MR images for nodule, organ and lesion segmentation, demonstrate that the proposed method not only obtains comparable or even better segmentation results (compared to the LSM) but also achieves significant acceleration.
The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 ChallengeNicholas Heller, Fabian Isensee, Klaus H. Maier-Hein et al.
There is a large body of literature linking anatomic and geometric characteristics of kidney tumors to perioperative and oncologic outcomes. Semantic segmentation of these tumors and their host kidneys is a promising tool for quantitatively characterizing these lesions, but its adoption is limited due to the manual effort required to produce high-quality 3D segmentations of these structures. Recently, methods based on deep learning have shown excellent results in automatic 3D segmentation, but they require large datasets for training, and there remains little consensus on which methods perform best. The 2019 Kidney and Kidney Tumor Segmentation challenge (KiTS19) was a competition held in conjunction with the 2019 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) which sought to address these issues and stimulate progress on this automatic segmentation problem. A training set of 210 cross sectional CT images with kidney tumors was publicly released with corresponding semantic segmentation masks. 106 teams from five continents used this data to develop automated systems to predict the true segmentation masks on a test set of 90 CT images for which the corresponding ground truth segmentations were kept private. These predictions were scored and ranked according to their average So rensen-Dice coefficient between the kidney and tumor across all 90 cases. The winning team achieved a Dice of 0.974 for kidney and 0.851 for tumor, approaching the inter-annotator performance on kidney (0.983) but falling short on tumor (0.923). This challenge has now entered an "open leaderboard" phase where it serves as a challenging benchmark in 3D semantic segmentation.
41.2CVJan 13, 2019
The Liver Tumor Segmentation Benchmark (LiTS)Patrick Bilic, Patrick Christ, Hongwei Bran Li et al.
In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with varied sizes and appearances with various lesion-to-background levels (hyper-/hypo-dense), created in collaboration with seven hospitals and research institutions. Seventy-five submitted liver and liver tumor segmentation algorithms were trained on a set of 131 computed tomography (CT) volumes and were tested on 70 unseen test images acquired from different patients. We found that not a single algorithm performed best for both liver and liver tumors in the three events. The best liver segmentation algorithm achieved a Dice score of 0.963, whereas, for tumor segmentation, the best algorithms achieved Dices scores of 0.674 (ISBI 2017), 0.702 (MICCAI 2017), and 0.739 (MICCAI 2018). Retrospectively, we performed additional analysis on liver tumor detection and revealed that not all top-performing segmentation algorithms worked well for tumor detection. The best liver tumor detection method achieved a lesion-wise recall of 0.458 (ISBI 2017), 0.515 (MICCAI 2017), and 0.554 (MICCAI 2018), indicating the need for further research. LiTS remains an active benchmark and resource for research, e.g., contributing the liver-related segmentation tasks in \url{http://medicaldecathlon.com/}. In addition, both data and online evaluation are accessible via \url{www.lits-challenge.com}.