CLMar 22, 2012

Analysing Temporally Annotated Corpora with CAVaT

arXiv:1203.5051v126 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for reliable analysis and validation tools for researchers working with temporally annotated natural language corpora, but it is incremental as it builds on the existing TimeML standard.

The authors tackled the problem of analyzing and validating temporally annotated corpora by developing CAVaT, a tool that provides statistical analysis and logical consistency checks for TimeML-annotated data, and they demonstrated its ability to detect inconsistencies in the TimeBank corpus.

We present CAVaT, a tool that performs Corpus Analysis and Validation for TimeML. CAVaT is an open source, modular checking utility for statistical analysis of features specific to temporally-annotated natural language corpora. It provides reporting, highlights salient links between a variety of general and time-specific linguistic features, and also validates a temporal annotation to ensure that it is logically consistent and sufficiently annotated. Uniquely, CAVaT provides analysis specific to TimeML-annotated temporal information. TimeML is a standard for annotating temporal information in natural language text. In this paper, we present the reporting part of CAVaT, and then its error-checking ability, including the workings of several novel TimeML document verification methods. This is followed by the execution of some example tasks using the tool to show relations between times, events, signals and links. We also demonstrate inconsistencies in a TimeML corpus (TimeBank) that have been detected with CAVaT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes