GrammarTagger: A Multilingual, Minimally-Supervised Grammar Profiler for Language Education
This provides a tool for language educators and learners to analyze and search educational materials by grammatical features, though it is incremental as it builds on existing grammar profiling methods.
The authors tackled the problem of creating a grammar profiler for language education by developing GrammarTagger, which identifies grammatical features from text with minimal supervision, achieving an F1 score of approximately 0.6 using only a couple hundred sentences in English and Chinese.
We present GrammarTagger, an open-source grammar profiler which, given an input text, identifies grammatical features useful for language education. The model architecture enables it to learn from a small amount of texts annotated with spans and their labels, which 1) enables easier and more intuitive annotation, 2) supports overlapping spans, and 3) is less prone to error propagation, compared to complex hand-crafted rules defined on constituency/dependency parses. We show that we can bootstrap a grammar profiler model with $F_1 \approx 0.6$ from only a couple hundred sentences both in English and Chinese, which can be further boosted via learning a multilingual model. With GrammarTagger, we also build Octanove Learn, a search engine of language learning materials indexed by their reading difficulty and grammatical features. The code and pretrained models are publicly available at \url{https://github.com/octanove/grammartagger}.