CLMar 9, 2024

MaiBaam Annotation Guidelines

arXiv:2403.05902v31 citationsh-index: 10
AI Analysis

This work addresses the need for annotated linguistic resources for the Bavarian language, which is incremental as it builds on existing Universal Dependencies frameworks.

The authors tackled the problem of creating a manually annotated Bavarian corpus with part-of-speech tags, syntactic dependencies, and German lemmas, resulting in the MaiBaam corpus that elaborates on Universal Dependencies guidelines for Bavarian.

This document provides the annotation guidelines for MaiBaam, a Bavarian corpus manually annotated with part-of-speech (POS) tags, syntactic dependencies, and German lemmas. MaiBaam belongs to the Universal Dependencies (UD) project, and our annotations elaborate on the general and German UD version 2 guidelines. In this document, we detail how to preprocess and tokenize Bavarian data, provide an overview of the POS tags and dependencies we use, explain annotation decisions that would also apply to closely related languages like German, and lastly we introduce and motivate decisions that are specific to Bavarian grammar.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes