When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter
This work addresses the challenge of accurate part-of-speech tagging for Italian social media text, which is incremental as it adapts existing methods to a specific domain.
The authors tackled the problem of part-of-speech tagging for Italian Twitter data by bootstrapping a state-of-the-art tagger, achieving better results using native Twitter data with small gold and additional silver-labeled data compared to large mixed-genre annotated data.
We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter data, in the context of the Evalita 2016 PoSTWITA shared task. We show that training the tagger on native Twitter data enriched with little amounts of specifically selected gold data and additional silver-labelled data scraped from Facebook, yields better results than using large amounts of manually annotated data from a mix of genres.