Learning Rich Geographical Representations: Predicting Colorectal Cancer Survival in the State of Iowa
This work addresses survival prediction for colorectal cancer patients in a specific region, with incremental improvements using geographical data.
The study tackled predicting colorectal cancer survival curves for patients in Iowa from 1989 to 2012, finding that geographical features improve predictive performance, with spectral clustering-based representations yielding the best results, though performance deviated at the five-year survival mark.
Neural networks are capable of learning rich, nonlinear feature representations shown to be beneficial in many predictive tasks. In this work, we use these models to explore the use of geographical features in predicting colorectal cancer survival curves for patients in the state of Iowa, spanning the years 1989 to 2012. Specifically, we compare model performance using a newly defined metric -- area between the curves (ABC) -- to assess (a) whether survival curves can be reasonably predicted for colorectal cancer patients in the state of Iowa, (b) whether geographical features improve predictive performance, and (c) whether a simple binary representation or richer, spectral clustering-based representation perform better. Our findings suggest that survival curves can be reasonably estimated on average, with predictive performance deviating at the five-year survival mark. We also find that geographical features improve predictive performance, and that the best performance is obtained using richer, spectral analysis-elicited features.