Joint Object and State Recognition using Language Knowledge
This work addresses state identification in robotics applications, specifically for cooking objects, but is incremental as it builds on existing deep learning and knowledge graph techniques.
This paper tackles the problem of recognizing objects and their states in cooking-related images by jointly using object and state predictions to improve classification accuracy. The method combines a CNN with a language knowledge graph, resulting in enhanced classification performance on a cooking dataset.
The state of an object is an important piece of knowledge in robotics applications. States and objects are intertwined together, meaning that object information can help recognize the state of an image and vice versa. This paper addresses the state identification problem in cooking related images and uses state and object predictions together to improve the classification accuracy of objects and their states from a single image. The pipeline presented in this paper includes a CNN with a double classification layer and the Concept-Net language knowledge graph on top. The language knowledge creates a semantic likelihood between objects and states. The resulting object and state confidences from the deep architecture are used together with object and state relatedness estimates from a language knowledge graph to produce marginal probabilities for objects and states. The marginal probabilities and confidences of objects (or states) are fused together to improve the final object (or state) classification results. Experiments on a dataset of cooking objects show that using a language knowledge graph on top of a deep neural network effectively enhances object and state classification.