Symbol Correctness in Deep Neural Networks Containing Symbolic Layers
This work addresses the problem of designing and analyzing hybrid AI systems that integrate perception and reasoning, offering a foundational framework for researchers in neurosymbolic AI, though it is incremental in formalizing existing concepts.
The paper tackles the challenge of ensuring intermediate symbolic predictions are correct in Neurosymbolic Deep Neural Networks (NS-DNNs), which combine neural and symbolic layers, by formalizing the principle of symbol correctness. It demonstrates that this property is essential for explainability and transfer learning, and provides a framework to analyze model behavior and training tradeoffs.
To handle AI tasks that combine perception and logical reasoning, recent work introduces Neurosymbolic Deep Neural Networks (NS-DNNs), which contain -- in addition to traditional neural layers -- symbolic layers: symbolic expressions (e.g., SAT formulas, logic programs) that are evaluated by symbolic solvers during inference. We identify and formalize an intuitive, high-level principle that can guide the design and analysis of NS-DNNs: symbol correctness, the correctness of the intermediate symbols predicted by the neural layers with respect to a (generally unknown) ground-truth symbolic representation of the input data. We demonstrate that symbol correctness is a necessary property for NS-DNN explainability and transfer learning (despite being in general impossible to train for). Moreover, we show that the framework of symbol correctness provides a precise way to reason and communicate about model behavior at neural-symbolic boundaries, and gives insight into the fundamental tradeoffs faced by NS-DNN training algorithms. In doing so, we both identify significant points of ambiguity in prior work, and provide a framework to support further NS-DNN developments.