Water and Sediment Analyse Using Predictive Models
This work addresses marine pollution monitoring for environmental scientists by automating labor-intensive tests, though it is incremental as it focuses on data imputation for existing datasets.
The paper tackles the problem of water quality assessment by developing a predictive model using machine learning to infer pollution levels from water and sediment samples, achieving 75% accuracy even with 57% missing data.
The increasing prevalence of marine pollution during the past few decades motivated recent research to help ease the situation. Typical water quality assessment requires continuous monitoring of water and sediments at remote locations with labour intensive laboratory tests to determine the degree of pollution. We propose an automated framework where we formalise a predictive model using Machine Learning to infer the water quality and level of pollution using collected water and sediments samples. One commonly encountered difficulty performing statistical analysis with water and sediment is the limited amount of data samples and incomplete dataset due to the sparsity of sample collection location. To this end, we performed extensive investigation on various data imputation methods' performance in water and sediment datasets with various data missing rates. Empirically, we show that our best model archives an accuracy of 75% after accounting for 57% of missing data. Experimentally, we show that our model would assist in assessing water quality screening based on possibly incomplete real-world data.