AI CL LGOct 24, 2013

Durkheim Project Data Analysis Report

arXiv:1310.6775v1

Originality Synthesis-oriented

AI Analysis

This work addresses suicidality prediction for veterans, but it is incremental as it applies existing genetic programming methods to a new dataset with limited improvements.

The study tackled suicidality prediction using unstructured text from clinician notes for veterans, achieving classification fidelity of 98% on cohorts and cross-validation accuracy of 50% to 69% with ensemble averages up to 67%.

This report describes the suicidality prediction models created under the DARPA DCAPS program in association with the Durkheim Project [http://durkheimproject.org/]. The models were built primarily from unstructured text (free-format clinician notes) for several hundred patient records obtained from the Veterans Health Administration (VHA). The models were constructed using a genetic programming algorithm applied to bag-of-words and bag-of-phrases datasets. The influence of additional structured data was explored but was found to be minor. Given the small dataset size, classification between cohorts was high fidelity (98%). Cross-validation suggests these models are reasonably predictive, with an accuracy of 50% to 69% on five rotating folds, with ensemble averages of 58% to 67%. One particularly noteworthy result is that word-pairs can dramatically improve classification accuracy; but this is the case only when one of the words in the pair is already known to have a high predictive value. By contrast, the set of all possible word-pairs does not improve on a simple bag-of-words model.

View on arXiv PDF

Similar