Hierarchical Delay Attribution Classification using Unstructured Text in Train Management Systems
This work addresses the complex task of automating delay attribution for train management systems in Sweden, though it is incremental as it applies existing methods to a specific domain.
The paper tackled the problem of manually assigning train delay attribution codes by developing a machine learning-based decision support system using event descriptions, finding that a hierarchical approach outperformed a flat approach but both were worse than manual classification.
EU directives stipulate a systematic follow-up of train delays. In Sweden, the Swedish Transport Administration registers and assigns an appropriate delay attribution code. However, this delay attribution code is assigned manually, which is a complex task. In this paper, a machine learning-based decision support for assigning delay attribution codes based on event descriptions is investigated. The text is transformed using TF-IDF, and two models, Random Forest and Support Vector Machine, are evaluated against a random uniform classifier and the classification performance of the Swedish Transport Administration. Further, the problem is modeled as both a hierarchical and flat approach. The results indicate that a hierarchical approach performs better than a flat approach. Both approaches perform better than the random uniform classifier but perform worse than the manual classification.