Topical Phrase Extraction from Clinical Reports by Incorporating both Local and Global Context
This addresses the challenge of extracting meaningful phrases from clinical reports with technical jargon and limited data, but it is incremental as it combines existing methods.
The paper tackled the problem of topical phrase extraction from clinical reports by incorporating both local and global context, resulting in outperforming state-of-the-art approaches in topic coherence and computational cost.
Making sense of words often requires to simultaneously examine the surrounding context of a term as well as the global themes characterizing the overall corpus. Several topic models have already exploited word embeddings to recognize local context, however, it has been weakly combined with the global context during the topic inference. This paper proposes to extract topical phrases corroborating the word embedding information with the global context detected by Latent Semantic Analysis, and then combine them by means of the Pólya urn model. To highlight the effectiveness of this combined approach the model was assessed analyzing clinical reports, a challenging scenario characterized by technical jargon and a limited word statistics available. Results show it outperforms the state-of-the-art approaches in terms of both topic coherence and computational cost.