Method and Dataset Mining in Scientific Papers
This addresses the need for better literature analysis in machine learning by focusing on method and dataset extraction, which is useful for discipline analysis and algorithm recommendation.
The paper tackles the problem of extracting methods and datasets from machine learning papers, proposing a novel entity recognition model called MDER and constructing datasets from PAKDD conference papers (2009-2019). They conducted preliminary experiments to assess extraction performance and visualized the mining results.
Literature analysis facilitates researchers better understanding the development of science and technology. The conventional literature analysis focuses on the topics, authors, abstracts, keywords, references, etc., and rarely pays attention to the content of papers. In the field of machine learning, the involved methods (M) and datasets (D) are key information in papers. The extraction and mining of M and D are useful for discipline analysis and algorithm recommendation. In this paper, we propose a novel entity recognition model, called MDER, and constructe datasets from the papers of the PAKDD conferences (2009-2019). Some preliminary experiments are conducted to assess the extraction performance and the mining results are visualized.