Contrastive Object Detection Using Knowledge Graph Embeddings
This addresses object detection for computer vision applications, but it is incremental as it adapts existing knowledge embeddings to standard architectures.
The paper tackled the problem of object detection treating classes as discrete and unrelated, showing that using knowledge-based class embeddings results in semantically grounded misclassifications while performing similarly to one-hot methods on COCO and Cityscapes benchmarks.
Object recognition for the most part has been approached as a one-hot problem that treats classes to be discrete and unrelated. Each image region has to be assigned to one member of a set of objects, including a background class, disregarding any similarities in the object types. In this work, we compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs that are widely applied in open world object detection. Extensive experimental results on multiple knowledge-embeddings as well as distance metrics indicate that knowledge-based class representations result in more semantically grounded misclassifications while performing on par compared to one-hot methods on the challenging COCO and Cityscapes object detection benchmarks. We generalize our findings to multiple object detection architectures by proposing a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.