Temporal Performance Prediction for Deep Convolutional Long Short-Term Memory Networks
This work addresses uncertainty estimation for video-based semantic segmentation in autonomous driving, but it is incremental as it builds on existing convolutional LSTM networks with a new postprocessing approach.
The paper tackles the problem of quantifying predictive uncertainty for deep semantic segmentation networks in safety-critical tasks like autonomous driving, by introducing a temporal postprocessing method that estimates prediction performance using cell state-based metrics, achieving classification of intersection over union being zero or greater than zero.
Quantifying predictive uncertainty of deep semantic segmentation networks is essential in safety-critical tasks. In applications like autonomous driving, where video data is available, convolutional long short-term memory networks are capable of not only providing semantic segmentations but also predicting the segmentations of the next timesteps. These models use cell states to broadcast information from previous data by taking a time series of inputs to predict one or even further steps into the future. We present a temporal postprocessing method which estimates the prediction performance of convolutional long short-term memory networks by either predicting the intersection over union of predicted and ground truth segments or classifying between intersection over union being equal to zero or greater than zero. To this end, we create temporal cell state-based input metrics per segment and investigate different models for the estimation of the predictive quality based on these metrics. We further study the influence of the number of considered cell states for the proposed metrics.