First Experiences with the Identification of People at Risk for Diabetes in Argentina using Machine Learning Techniques
This work addresses the need for population-specific diabetes risk identification tools in Argentina, representing an incremental step as it applies existing methods to new data.
The researchers tackled the challenge of detecting Type 2 Diabetes and Prediabetes in Argentina by developing predictive models using machine learning, achieving very good performance with Random Forest, Decision Trees, and Artificial Neural Networks on specific datasets.
Detecting Type 2 Diabetes (T2D) and Prediabetes (PD) is a real challenge for medicine due to the absence of pathogenic symptoms and the lack of known associated risk factors. Even though some proposals for machine learning models enable the identification of people at risk, the nature of the condition makes it so that a model suitable for one population may not necessarily be suitable for another. In this article, the development and assessment of predictive models to identify people at risk for T2D and PD specifically in Argentina are discussed. First, the database was thoroughly preprocessed and three specific datasets were generated considering a compromise between the number of records and the amount of available variables. After applying 5 different classification models, the results obtained show that a very good performance was observed for two datasets with some of these models. In particular, RF, DT, and ANN demonstrated great classification power, with good values for the metrics under consideration. Given the lack of this type of tool in Argentina, this work represents the first step towards the development of more sophisticated models.