Change Summarization of Diachronic Scholarly Paper Collections by Semantic Evolution Analysis
It addresses the difficulty for newcomers and historians in spotting trends in large scholarly domains, but the approach appears incremental as it builds on temporal summarization methods.
The paper tackles the problem of summarizing semantic changes in scholarly paper collections over time to help newcomers and historians of science understand trends and position research in context, demonstrating an approach on the ACL Anthology Reference Corpus with 22,878 articles from 1979 to 2015.
The amount of scholarly data has been increasing dramatically over the last years. For newcomers to a particular science domain (e.g., IR, physics, NLP) it is often difficult to spot larger trends and to position the latest research in the context of prior scientific achievements and breakthroughs. Similarly, researchers in the history of science are interested in tools that allow them to analyze and visualize changes in particular scientific domains. Temporal summarization and related methods should be then useful for making sense of large volumes of scientific discourse data aggregated over time. We demonstrate a novel approach to analyze the collections of research papers published over longer time periods to provide a high-level overview of important semantic changes that occurred over the progress of time. Our approach is based on comparing word semantic representations over time and aims to support users in a better understanding of large domain-focused archives of scholarly publications. As an example dataset we use the ACL Anthology Reference Corpus that spans from 1979 to 2015 and contains 22,878 scholarly articles.