Machine Learning in Epidemiology
It offers a practical guide for epidemiologists to use machine learning tools, but it is incremental as it focuses on established methods and applications.
The chapter provides methodological foundations for applying machine learning in epidemiology to analyze complex and high-dimensional data, using code examples in R with a heart disease dataset.
In the age of digital epidemiology, epidemiologists are faced by an increasing amount of data of growing complexity and dimensionality. Machine learning is a set of powerful tools that can help to analyze such enormous amounts of data. This chapter lays the methodological foundations for successfully applying machine learning in epidemiology. It covers the principles of supervised and unsupervised learning and discusses the most important machine learning methods. Strategies for model evaluation and hyperparameter optimization are developed and interpretable machine learning is introduced. All these theoretical parts are accompanied by code examples in R, where an example dataset on heart disease is used throughout the chapter.