CLAIDec 18, 2025

GinSign: Grounding Natural Language Into System Signatures for Temporal Logic Translation

arXiv:2512.16770v1h-index: 13
Originality Incremental advance
AI Analysis

This addresses the challenge of creating formal specifications for trustworthy autonomous systems without manual effort, representing a strong specific gain in this domain.

The paper tackles the problem of translating natural language to temporal logic specifications for autonomous systems, where existing methods either assume accurate atom grounding or have low accuracy. The proposed GinSign framework achieves 95.5% grounded logical-equivalence scores, a 1.4× improvement over state-of-the-art.

Natural language (NL) to temporal logic (TL) translation enables engineers to specify, verify, and enforce system behaviors without manually crafting formal specifications-an essential capability for building trustworthy autonomous systems. While existing NL-to-TL translation frameworks have demonstrated encouraging initial results, these systems either explicitly assume access to accurate atom grounding or suffer from low grounded translation accuracy. In this paper, we propose a framework for Grounding Natural Language Into System Signatures for Temporal Logic translation called GinSign. The framework introduces a grounding model that learns the abstract task of mapping NL spans onto a given system signature: given a lifted NL specification and a system signature $\mathcal{S}$, the classifier must assign each lifted atomic proposition to an element of the set of signature-defined atoms $\mathcal{P}$. We decompose the grounding task hierarchically -- first predicting predicate labels, then selecting the appropriately typed constant arguments. Decomposing this task from a free-form generation problem into a structured classification problem permits the use of smaller masked language models and eliminates the reliance on expensive LLMs. Experiments across multiple domains show that frameworks which omit grounding tend to produce syntactically correct lifted LTL that is semantically nonequivalent to grounded target expressions, whereas our framework supports downstream model checking and achieves grounded logical-equivalence scores of $95.5\%$, a $1.4\times$ improvement over SOTA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes