LG AI MLOct 4, 2019

Risks of Using Non-verified Open Data: A case study on using Machine Learning techniques for predicting Pregnancy Outcomes in India

Anusua Trivedi, Sumit Mukherjee, Edmund Tse, Anne Ewing, Juan Lavista Ferres

arXiv:1910.02136v21.87 citations

Originality Synthesis-oriented

AI Analysis

This addresses data quality issues in AI for public health in developing countries, but it is incremental as it focuses on a case study without introducing new methods.

The paper tackles the problem of using non-verified open data for predicting pregnancy outcomes in India, highlighting that AI applications without proper data understanding can lead to erroneous conclusions.

Artificial intelligence (AI) has evolved considerably in the last few years. While applications of AI is now becoming more common in fields like retail and marketing, application of AI in solving problems related to developing countries is still an emerging topic. Specially, AI applications in resource-poor settings remains relatively nascent. There is a huge scope of AI being used in such settings. For example, researchers have started exploring AI applications to reduce poverty and deliver a broad range of critical public services. However, despite many promising use cases, there are many dataset related challenges that one has to overcome in such projects. These challenges often take the form of missing data, incorrectly collected data and improperly labeled variables, among other factors. As a result, we can often end up using data that is not representative of the problem we are trying to solve. In this case study, we explore the challenges of using such an open dataset from India, to predict an important health outcome. We highlight how the use of AI without proper understanding of reporting metrics can lead to erroneous conclusions.

View on arXiv PDF

Similar