A Fast Deep Learning Model for Textual Relevance in Biomedical Information Retrieval
This addresses the challenge of handling large technical vocabularies and semantic variations in biomedical publications for researchers and practitioners, though it is incremental as it builds on existing deep learning methods.
The authors tackled the problem of relevance in biomedical literature search by introducing a deep learning model that uses pre-trained word embeddings and a convolutional network to compute relevance scores, resulting in a fast model that outperforms comparable state-of-the-art approaches.
Publications in the life sciences are characterized by a large technical vocabulary, with many lexical and semantic variations for expressing the same concept. Towards addressing the problem of relevance in biomedical literature search, we introduce a deep learning model for the relevance of a document's text to a keyword style query. Limited by a relatively small amount of training data, the model uses pre-trained word embeddings. With these, the model first computes a variable-length Delta matrix between the query and document, representing a difference between the two texts, which is then passed through a deep convolution stage followed by a deep feed-forward network to compute a relevance score. This results in a fast model suitable for use in an online search engine. The model is robust and outperforms comparable state-of-the-art deep learning approaches.