Bridging Neural and Symbolic Representations with Transitional Dictionary Learning
This addresses the challenge of bridging neural and symbolic representations for AI systems, particularly in compositional visual tasks, though it appears incremental as it builds on existing dictionary learning and diffusion models.
The paper tackles the problem of learning symbolic knowledge like visual parts and relations from compositional visual data without relying on pre-trained visual features, introducing a Transitional Dictionary Learning framework that significantly outperforms state-of-the-art unsupervised part segmentation methods.
This paper introduces a novel Transitional Dictionary Learning (TDL) framework that can implicitly learn symbolic knowledge, such as visual parts and relations, by reconstructing the input as a combination of parts with implicit relations. We propose a game-theoretic diffusion model to decompose the input into visual parts using the dictionaries learned by the Expectation Maximization (EM) algorithm, implemented as the online prototype clustering, based on the decomposition results. Additionally, two metrics, clustering information gain, and heuristic shape score are proposed to evaluate the model. Experiments are conducted on three abstract compositional visual object datasets, which require the model to utilize the compositionality of data instead of simply exploiting visual features. Then, three tasks on symbol grounding to predefined classes of parts and relations, as well as transfer learning to unseen classes, followed by a human evaluation, were carried out on these datasets. The results show that the proposed method discovers compositional patterns, which significantly outperforms the state-of-the-art unsupervised part segmentation methods that rely on visual features from pre-trained backbones. Furthermore, the proposed metrics are consistent with human evaluations.