CLMay 15, 2018

Generating Continuous Representations of Medical Texts

arXiv:1805.05691v132.01095 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of handling medical texts with syntactic and domain-specific shorthands for machine learning applications, but it is incremental as it builds on existing autoencoder methods.

The authors tackled the problem of generating medical texts and learning continuous representations from discrete, high-dimensional textual inputs, achieving a lower model perplexity than a traditional LSTM generator.

We present an architecture that generates medical texts while learning an informative, continuous representation with discriminative features. During training the input to the system is a dataset of captions for medical X-Rays. The acquired continuous representations are of particular interest for use in many machine learning techniques where the discrete and high-dimensional nature of textual input is an obstacle. We use an Adversarially Regularized Autoencoder to create realistic text in both an unconditional and conditional setting. We show that this technique is applicable to medical texts which often contain syntactic and domain-specific shorthands. A quantitative evaluation shows that we achieve a lower model perplexity than a traditional LSTM generator.

View on arXiv PDF

Similar