0.8CLMay 12, 2018
Huge Automatically Extracted Training Sets for Multilingual Word Sense DisambiguationTommaso Pasini, Francesco Maria Elia, Roberto Navigli
We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences. Experiments prove that these corpora can be effectively used as training sets for supervised WSD systems, surpassing the state of the art for low-resourced languages and providing competitive results for English, where manually annotated training sets are accessible. The data is available at trainomatic.org.
0.8CLApr 19, 2016
Syntactic and semantic classification of verb arguments using dependency-based and rich semantic featuresFrancesco Elia
Corpus Pattern Analysis (CPA) has been the topic of Semeval 2015 Task 15, aimed at producing a system that can aid lexicographers in their efforts to build a dictionary of meanings for English verbs using the CPA annotation process. CPA parsing is one of the subtasks which this annotation process is made of and it is the focus of this report. A supervised machine-learning approach has been implemented, in which syntactic features derived from parse trees and semantic features derived from WordNet and word embeddings are used. It is shown that this approach performs well, even with the data sparsity issues that characterize the dataset, and can obtain better results than other system by a margin of about 4% f-score.