ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus
This tool aids researchers in multilingual NLP by enabling typological analysis, but it is incremental as it builds on existing corpus exploration methods.
The researchers tackled the challenge of multilingual NLP by developing ParCourE, an online tool for browsing a word-aligned parallel corpus covering 1334 languages, which they demonstrated is useful for typological research.
With more than 7000 languages worldwide, multilingual natural language processing (NLP) is essential both from an academic and commercial perspective. Researching typological properties of languages is fundamental for progress in multilingual NLP. Examples include assessing language similarity for effective transfer learning, injecting inductive biases into machine learning models or creating resources such as dictionaries and inflection tables. We provide ParCourE, an online tool that allows to browse a word-aligned parallel corpus, covering 1334 languages. We give evidence that this is useful for typological research. ParCourE can be set up for any parallel corpus and can thus be used for typological research on other corpora as well as for exploring their quality and properties.