George Mulcaire

3papers

552citations

Novelty50%

AI Score29

Ranked #150,484 of 201,326 authors (top 75%)#26,406 in CL (top 81%)

3 Papers

CLFeb 5, 2016Code

Massively Multilingual Word Embeddings

Waleed Ammar, George Mulcaire, Yulia Tsvetkov et al.

We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research in this area, along with open-source releases of all our methods.

CLAug 10, 2016

Hierarchical Character-Word Models for Language Identification

Aaron Jaech, George Mulcaire, Shobhit Hathi et al.

Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching.

CLFeb 4, 2016

Many Languages, One Parser

Waleed Ammar, George Mulcaire, Miguel Ballesteros et al.

We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser's performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.