CVSep 22, 2019

Semi-supervised estimation of event temporal length for cell event detection

Ha Tran Hong Phan, Ashnil Kumar, David Feng, Michael Fulham, Jinman Kim

arXiv:1909.09946v10.9

Originality Incremental advance

AI Analysis

This addresses the need for reduced annotation costs in cell event detection for biomedical research, though it is incremental as it builds on existing LSTM models.

The paper tackled the problem of determining optimal input sequence length for mitosis detection in cell videos, proposing a semi-supervised method that achieved F1-scores of 0.880-0.907 with only 18 annotated frames, outperforming other methods using 110 frames by 10%.

Cell event detection in cell videos is essential for monitoring of cellular behavior over extended time periods. Deep learning methods have shown great success in the detection of cell events for their ability to capture more discriminative features of cellular processes compared to traditional methods. In particular, convolutional long short-term memory (LSTM) models, which exploits the changes in cell events observable in video sequences, is the state-of-the-art for mitosis detection in cell videos. However, their limitations are the determination of the input sequence length, which is often performed empirically, and the need for a large annotated training dataset which is expensive to prepare. We propose a novel semi-supervised method of optimal length detection for mitosis detection with two key contributions: (i) an unsupervised step for learning the spatial and temporal locations of cells in their normal stage and approximating the distribution of temporal lengths of cell events and, (ii) a step of inferring, from that distribution, an optimal input sequence length and a minimal number of annotated frames for training a LSTM model for each particular video. We evaluated our method in detecting mitosis in densely packed stem cells in a phase-contrast microscopy videos. Our experimental data prove that increasing the input sequence length of LSTM can lead to a decrease in performance. Our results also show that by approximating the optimal input sequence length of the tested video, a model trained with only 18 annotated frames achieved F1-scores of 0.880-0.907, which are 10% higher than those of other published methods with a full set of 110 training annotated frames.

View on arXiv PDF

Similar