CLJul 8, 2025

A Semantic Parsing Framework for End-to-End Time Normalization

arXiv:2507.06450v1h-index: 46
Originality Incremental advance
AI Analysis

This work addresses limitations in traditional time normalization systems for applications like information retrieval and clinical decision-making, offering a practical and interpretable solution.

The paper tackles the problem of time normalization by converting natural language temporal expressions into machine-readable representations, achieving strong performance with small, deployable models that outperform large language models on this task.

Time normalization is the task of converting natural language temporal expressions into machine-readable representations. It underpins many downstream applications in information retrieval, question answering, and clinical decision-making. Traditional systems based on the ISO-TimeML schema limit expressivity and struggle with complex constructs such as compositional, event-relative, and multi-span time expressions. In this work, we introduce a novel formulation of time normalization as a code generation task grounded in the SCATE framework, which defines temporal semantics through symbolic and compositional operators. We implement a fully executable SCATE Python library and demonstrate that large language models (LLMs) can generate executable SCATE code. Leveraging this capability, we develop an automatic data augmentation pipeline using LLMs to synthesize large-scale annotated data with code-level validation. Our experiments show that small, locally deployable models trained on this augmented data can achieve strong performance, outperforming even their LLM parents and enabling practical, accurate, and interpretable time normalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes