Hierarchical classification at multiple operating points
This work addresses the need for better evaluation and methods in hierarchical classification, which is incremental but provides practical gains for domains with class hierarchies.
The paper tackles the problem of hierarchical classification by evaluating methods at multiple operating points, revealing that top-down classifiers are outperformed by a flat softmax baseline across all granularities, and proposes a soft structured hinge loss that significantly improves performance.
Many classification problems consider classes that form a hierarchy. Classifiers that are aware of this hierarchy may be able to make confident predictions at a coarse level despite being uncertain at the fine-grained level. While it is generally possible to vary the granularity of predictions using a threshold at inference time, most contemporary work considers only leaf-node prediction, and almost no prior work has compared methods at multiple operating points. We present an efficient algorithm to produce operating characteristic curves for any method that assigns a score to every class in the hierarchy. Applying this technique to evaluate existing methods reveals that top-down classifiers are dominated by a naive flat softmax classifier across the entire operating range. We further propose two novel loss functions and show that a soft variant of the structured hinge loss is able to significantly outperform the flat baseline. Finally, we investigate the poor accuracy of top-down classifiers and demonstrate that they perform relatively well on unseen classes. Code is available online at https://github.com/jvlmdr/hiercls.