IVMar 23, 2023
OCELOT: Overlapped Cell on Tissue Dataset for HistopathologyJeongun Ryu, Aaron Valero Puche, JaeWoong Shin et al.
Cell detection is a fundamental task in computational pathology that can be used for extracting high-level medical information from whole-slide images. For accurate cell detection, pathologists often zoom out to understand the tissue-level structures and zoom in to classify cells based on their morphology and the surrounding context. However, there is a lack of efforts to reflect such behaviors by pathologists in the cell detection models, mainly due to the lack of datasets containing both cell and tissue annotations with overlapping regions. To overcome this limitation, we propose and publicly release OCELOT, a dataset purposely dedicated to the study of cell-tissue relationships for cell detection in histopathology. OCELOT provides overlapping cell and tissue annotations on images acquired from multiple organs. Within this setting, we also propose multi-task learning approaches that benefit from learning both cell and tissue tasks simultaneously. When compared against a model trained only for the cell detection task, our proposed approaches improve cell detection performance on 3 datasets: proposed OCELOT, public TIGER, and internal CARP datasets. On the OCELOT test set in particular, we show up to 6.79 improvement in F1-score. We believe the contributions of this paper, including the release of the OCELOT dataset at https://lunit-io.github.io/research/publications/ocelot are a crucial starting point toward the important research direction of incorporating cell-tissue relationships in computation pathology.
CVJul 30, 2024
Generalizing AI-driven Assessment of Immunohistochemistry across Immunostains and Cancer Types: A Universal Immunohistochemistry AnalyzerBiagio Brattoli, Mohammad Mostafavi, Taebum Lee et al.
Despite advancements in methodologies, immunohistochemistry (IHC) remains the most utilized ancillary test for histopathologic and companion diagnostics in targeted therapies. However, objective IHC assessment poses challenges. Artificial intelligence (AI) has emerged as a potential solution, yet its development requires extensive training for each cancer and IHC type, limiting versatility. We developed a Universal IHC (UIHC) analyzer, an AI model for interpreting IHC images regardless of tumor or IHC types, using training datasets from various cancers stained for PD-L1 and/or HER2. This multi-cohort trained model outperforms conventional single-cohort models in interpreting unseen IHCs (Kappa score 0.578 vs. up to 0.509) and consistently shows superior performance across different positive staining cutoff values. Qualitative analysis reveals that UIHC effectively clusters patches based on expression levels. The UIHC model also quantitatively assesses c-MET expression with MET mutations, representing a significant advancement in AI application in the era of personalized medicine and accumulating novel biomarkers.
CVSep 11, 2025
OCELOT 2023: Cell Detection from Cell-Tissue Interaction ChallengeJaeWoong Shin, Jeongun Ryu, Aaron Valero Puche et al.
Pathologists routinely alternate between different magnifications when examining Whole-Slide Images, allowing them to evaluate both broad tissue morphology and intricate cellular details to form comprehensive diagnoses. However, existing deep learning-based cell detection models struggle to replicate these behaviors and learn the interdependent semantics between structures at different magnifications. A key barrier in the field is the lack of datasets with multi-scale overlapping cell and tissue annotations. The OCELOT 2023 challenge was initiated to gather insights from the community to validate the hypothesis that understanding cell and tissue (cell-tissue) interactions is crucial for achieving human-level performance, and to accelerate the research in this field. The challenge dataset includes overlapping cell detection and tissue segmentation annotations from six organs, comprising 673 pairs sourced from 306 The Cancer Genome Atlas (TCGA) Whole-Slide Images with hematoxylin and eosin staining, divided into training, validation, and test subsets. Participants presented models that significantly enhanced the understanding of cell-tissue relationships. Top entries achieved up to a 7.99 increase in F1-score on the test set compared to the baseline cell-only model that did not incorporate cell-tissue relationships. This is a substantial improvement in performance over traditional cell-only detection methods, demonstrating the need for incorporating multi-scale semantics into the models. This paper provides a comparative analysis of the methods used by participants, highlighting innovative strategies implemented in the OCELOT 2023 challenge.
CRJan 18, 2022
AESPA: Accuracy Preserving Low-degree Polynomial Activation for Fast Private InferenceJaiyoung Park, Michael Jaemin Kim, Wonkyung Jung et al.
Hybrid private inference (PI) protocol, which synergistically utilizes both multi-party computation (MPC) and homomorphic encryption, is one of the most prominent techniques for PI. However, even the state-of-the-art PI protocols are bottlenecked by the non-linear layers, especially the activation functions. Although a standard non-linear activation function can generate higher model accuracy, it must be processed via a costly garbled-circuit MPC primitive. A polynomial activation can be processed via Beaver's multiplication triples MPC primitive but has been incurring severe accuracy drops so far. In this paper, we propose an accuracy preserving low-degree polynomial activation function (AESPA) that exploits the Hermite expansion of the ReLU and basis-wise normalization. We apply AESPA to popular ML models, such as VGGNet, ResNet, and pre-activation ResNet, to show an inference accuracy comparable to those of the standard models with ReLU activation, achieving superior accuracy over prior low-degree polynomial studies. When applied to the all-RELU baseline on the state-of-the-art Delphi PI protocol, AESPA shows up to 42.1x and 28.3x lower online latency and communication cost.
CRDec 31, 2021
BTS: An Accelerator for Bootstrappable Fully Homomorphic EncryptionSangpyo Kim, Jongmin Kim, Michael Jaemin Kim et al.
Homomorphic encryption (HE) enables the secure offloading of computations to the cloud by providing computation on encrypted data (ciphertexts). HE is based on noisy encryption schemes in which noise accumulates as more computations are applied to the data. The limited number of operations applicable to the data prevents practical applications from exploiting HE. Bootstrapping enables an unlimited number of operations or fully HE (FHE) by refreshing the ciphertext. Unfortunately, bootstrapping requires a significant amount of additional computation and memory bandwidth as well. Prior works have proposed hardware accelerators for computation primitives of FHE. However, to the best of our knowledge, this is the first to propose a hardware FHE accelerator that supports bootstrapping as a first-class citizen. In particular, we propose BTS - Bootstrappable, Technologydriven, Secure accelerator architecture for FHE. We identify the challenges of supporting bootstrapping in the accelerator and analyze the off-chip memory bandwidth and computation required. In particular, given the limitations of modern memory technology, we identify the HE parameter sets that are efficient for FHE acceleration. Based on the insights gained from our analysis, we propose BTS, which effectively exploits the parallelism innate in HE operations by arranging a massive number of processing elements in a grid. We present the design and microarchitecture of BTS, including a network-on-chip design that exploits a deterministic communication pattern. BTS shows 5,556x and 1,306x improved execution time on ResNet-20 and logistic regression over a CPU, with a chip area of 373.6mm^2 and up to 163.2W of power.
CRDec 3, 2020
Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUsSangpyo Kim, Wonkyung Jung, Jaiyoung Park et al.
Homomorphic encryption (HE) draws huge attention as it provides a way of privacy-preserving computations on encrypted messages. Number Theoretic Transform (NTT), a specialized form of Discrete Fourier Transform (DFT) in the finite field of integers, is the key algorithm that enables fast computation on encrypted ciphertexts in HE. Prior works have accelerated NTT and its inverse transformation on a popular parallel processing platform, GPU, by leveraging DFT optimization techniques. However, these GPU-based studies lack a comprehensive analysis of the primary differences between NTT and DFT or only consider small HE parameters that have tight constraints in the number of arithmetic operations that can be performed without decryption. In this paper, we analyze the algorithmic characteristics of NTT and DFT and assess the performance of NTT when we apply the optimizations that are commonly applicable to both DFT and NTT on modern GPUs. From the analysis, we identify that NTT suffers from severe main-memory bandwidth bottleneck on large HE parameter sets. To tackle the main-memory bandwidth issue, we propose a novel NTT-specific on-the-fly root generation scheme dubbed on-the-fly twiddling (OT). Compared to the baseline radix-2 NTT implementation, after applying all the optimizations, including OT, we achieve 4.2x speedup on a modern GPU.
CVJul 4, 2018
Restructuring Batch Normalization to Accelerate CNN TrainingWonkyung Jung, Daejin Jung, and Byeongho Kim et al.
Batch Normalization (BN) has become a core design block of modern Convolutional Neural Networks (CNNs). A typical modern CNN has a large number of BN layers in its lean and deep architecture. BN requires mean and variance calculations over each mini-batch during training. Therefore, the existing memory access reduction techniques, such as fusing multiple CONV layers, are not effective for accelerating BN due to their inability to optimize mini-batch related calculations during training. To address this increasingly important problem, we propose to restructure BN layers by first splitting a BN layer into two sub-layers (fission) and then combining the first sub-layer with its preceding CONV layer and the second sub-layer with the following activation and CONV layers (fusion). The proposed solution can significantly reduce main-memory accesses while training the latest CNN models, and the experiments on a chip multiprocessor show that the proposed BN restructuring can improve the performance of DenseNet-121 by 25.7%.