Learning Feature Representations for Keyphrase Extraction
This addresses the need for automated feature engineering in keyphrase extraction, reducing reliance on expert knowledge and improving generalization, though it is incremental as it builds on existing supervised approaches.
The paper tackles the problem of keyphrase extraction by introducing SurfKE, a feature learning framework that automatically discovers patterns from text, achieving remarkable performance improvements over strong baselines.
In supervised approaches for keyphrase extraction, a candidate phrase is encoded with a set of hand-crafted features and machine learning algorithms are trained to discriminate keyphrases from non-keyphrases. Although the manually-designed features have shown to work well in practice, feature engineering is a difficult process that requires expert knowledge and normally does not generalize well. In this paper, we present SurfKE, a feature learning framework that exploits the text itself to automatically discover patterns that keyphrases exhibit. Our model represents the document as a graph and automatically learns feature representation of phrases. The proposed model obtains remarkable improvements in performance over strong baselines.