PubSqueezer: A Text-Mining Web Tool to Transform Unstructured Documents into Structured Data
This tool helps researchers in biomedical fields manage and analyze large volumes of literature more efficiently, though it is incremental as it applies existing text-mining methods to a specific domain.
The authors tackled the challenge of keeping up with scientific literature by developing PubSqueezer, a web tool that transforms unstructured biomedical articles into structured data, enabling quick overviews and computational analyses like machine learning and NLP.
The amount of scientific papers published every day is daunting and constantly increasing. Keeping up with literature represents a challenge. If one wants to start exploring new topics it is hard to have a big picture without reading lots of articles. Furthermore, as one reads through literature, making mental connections is crucial to ask new questions which might lead to discoveries. In this work, I present a web tool which uses a Text Mining strategy to transform large collections of unstructured biomedical articles into structured data. Generated results give a quick overview on complex topics which can possibly suggest not explicitly reported information. In particular, I show two Data Science analyses. First, I present a literature based rare diseases network build using this tool in the hope that it will help clarify some aspects of these less popular pathologies. Secondly, I show how a literature based analysis conducted with PubSqueezer results allows to describe known facts about SARS-CoV-2. In one sentence, data generated with PubSqueezer make it easy to use scientific literate in any computational analysis such as machine learning, natural language processing etc. Availability: http://www.pubsqueezer.com