Revisiting the Importance of Encoding Logic Rules in Sentiment Classification
This work addresses reproducibility in NLP for sentiment classification, showing incremental improvements by replacing explicit logic rules with contextual embeddings.
The authors analyzed sentiment classification models on complex sentences and found that averaging over many random seeds is crucial for meaningful comparisons. They showed that a model using explicit logic rules is ineffective, while ELMo embeddings significantly outperform it by implicitly learning logic rules.
We analyze the performance of different sentiment classification models on syntactically complex inputs like A-but-B sentences. The first contribution of this analysis addresses reproducible research: to meaningfully compare different models, their accuracies must be averaged over far more random seeds than what has traditionally been reported. With proper averaging in place, we notice that the distillation model described in arXiv:1603.06318v4 [cs.LG], which incorporates explicit logic rules for sentiment classification, is ineffective. In contrast, using contextualized ELMo embeddings (arXiv:1802.05365v2 [cs.CL]) instead of logic rules yields significantly better performance. Additionally, we provide analysis and visualizations that demonstrate ELMo's ability to implicitly learn logic rules. Finally, a crowdsourced analysis reveals how ELMo outperforms baseline models even on sentences with ambiguous sentiment labels.