Sparse Prototype Network for Explainable Pedestrian Behavior Prediction
This addresses the need for explainable AI in autonomous driving and smart city applications, offering a novel method for multi-modal prediction with human-understandable features.
The paper tackles the challenge of predicting pedestrian behavior by introducing the Sparse Prototype Network (SPN), which simultaneously predicts actions, trajectories, and poses while providing explainable insights through a prototype bottleneck layer, achieving state-of-the-art performance on TITAN and PIE datasets.
Predicting pedestrian behavior is challenging yet crucial for applications such as autonomous driving and smart city. Recent deep learning models have achieved remarkable performance in making accurate predictions, but they fail to provide explanations of their inner workings. One reason for this problem is the multi-modal inputs. To bridge this gap, we present Sparse Prototype Network (SPN), an explainable method designed to simultaneously predict a pedestrian's future action, trajectory, and pose. SPN leverages an intermediate prototype bottleneck layer to provide sample-based explanations for its predictions. The prototypes are modality-independent, meaning that they can correspond to any modality from the input. Therefore, SPN can extend to arbitrary combinations of modalities. Regularized by mono-semanticity and clustering constraints, the prototypes learn consistent and human-understandable features and achieve state-of-the-art performance on action, trajectory and pose prediction on TITAN and PIE. Finally, we propose a metric named Top-K Mono-semanticity Scale to quantitatively evaluate the explainability. Qualitative results show the positive correlation between sparsity and explainability. Code available at https://github.com/Equinoxxxxx/SPN.