Automated Extraction of Number of Subjects in Randomised Controlled Trials
This incremental work aids medical text processing tasks like summarization and question answering for researchers and practitioners.
The paper tackled the problem of automatically extracting the number of subjects from randomized controlled trial abstracts, achieving 88% accuracy using a rule-based and supervised classification approach with a small training set of 201 RCTs.
We present a simple approach for automatically extracting the number of subjects involved in randomised controlled trials (RCT). Our approach first applies a set of rule-based techniques to extract candidate study sizes from the abstracts of the articles. Supervised classification is then performed over the candidates with support vector machines, using a small set of lexical, structural, and contextual features. With only a small annotated training set of 201 RCTs, we obtained an accuracy of 88\%. We believe that this system will aid complex medical text processing tasks such as summarisation and question answering.