Fusing Domain-Specific Content from Large Language Models into Knowledge Graphs for Enhanced Zero Shot Object State Classification
This addresses the high labor and time costs of generating domain-specific knowledge for vision tasks, though it is incremental as it builds on existing methods like LLMs and Knowledge Graphs.
The study tackled the problem of generating domain-specific knowledge for vision tasks by integrating Large Language Models (LLMs) into a pipeline with Knowledge Graphs and semantic embeddings for Zero-shot Object State Classification, resulting in substantial performance improvements and state-of-the-art results.
Domain-specific knowledge can significantly contribute to addressing a wide variety of vision tasks. However, the generation of such knowledge entails considerable human labor and time costs. This study investigates the potential of Large Language Models (LLMs) in generating and providing domain-specific information through semantic embeddings. To achieve this, an LLM is integrated into a pipeline that utilizes Knowledge Graphs and pre-trained semantic vectors in the context of the Vision-based Zero-shot Object State Classification task. We thoroughly examine the behavior of the LLM through an extensive ablation study. Our findings reveal that the integration of LLM-based embeddings, in combination with general-purpose pre-trained embeddings, leads to substantial performance improvements. Drawing insights from this ablation study, we conduct a comparative analysis against competing models, thereby highlighting the state-of-the-art performance achieved by the proposed approach.