AI CL IR LGSep 1, 2017

Learning what to read: Focused machine reading

Enrique Noriega-Atala, Marco A. Valenzuela-Escarcega, Clayton T. Morrison, Mihai Surdeanu

arXiv:1709.00149v157.61086 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of processing large-scale biomedical literature for bioinformatics applications, representing an incremental improvement in efficiency.

The paper tackles the problem of efficiently reading biomedical literature at scale by introducing a focused reading approach that guides machine reading to answer queries with fewer documents, demonstrating that a reinforcement learning method answers more queries than a baseline while being more efficient.

Recent efforts in bioinformatics have achieved tremendous progress in the machine reading of biomedical literature, and the assembly of the extracted biochemical interactions into large-scale models such as protein signaling pathways. However, batch machine reading of literature at today's scale (PubMed alone indexes over 1 million papers per year) is unfeasible due to both cost and processing overhead. In this work, we introduce a focused reading approach to guide the machine reading of biomedical literature towards what literature should be read to answer a biomedical query as efficiently as possible. We introduce a family of algorithms for focused reading, including an intuitive, strong baseline, and a second approach which uses a reinforcement learning (RL) framework that learns when to explore (widen the search) or exploit (narrow it). We demonstrate that the RL approach is capable of answering more queries than the baseline, while being more efficient, i.e., reading fewer documents.

View on arXiv PDF

Similar