CLJun 10, 2018

Cross-Lingual Task-Specific Representation Learning for Text Classification in Resource Poor Languages

arXiv:1806.03590v12 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of limited annotated data for text classification in resource-poor languages, benefiting NLP applications in such contexts, though it is incremental as it builds on existing cross-lingual and representation learning methods.

The paper tackles the problem of text classification in resource-poor languages by leveraging resource-rich languages, using a twin Bi-LSTM network with shared parameters and contrastive loss to learn cross-lingual representations, and it significantly outperforms state-of-the-art approaches in sentiment analysis and emoji prediction tasks for Hindi and Telugu.

Neural network models have shown promising results for text classification. However, these solutions are limited by their dependence on the availability of annotated data. The prospect of leveraging resource-rich languages to enhance the text classification of resource-poor languages is fascinating. The performance on resource-poor languages can significantly improve if the resource availability constraints can be offset. To this end, we present a twin Bidirectional Long Short Term Memory (Bi-LSTM) network with shared parameters consolidated by a contrastive loss function (based on a similarity metric). The model learns the representation of resource-poor and resource-rich sentences in a common space by using the similarity between their assigned annotation tags. Hence, the model projects sentences with similar tags closer and those with different tags farther from each other. We evaluated our model on the classification tasks of sentiment analysis and emoji prediction for resource-poor languages - Hindi and Telugu and resource-rich languages - English and Spanish. Our model significantly outperforms the state-of-the-art approaches in both the tasks across all metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes