Scalable Preprocessing of High Volume Bird Acoustic Data
This work makes large-scale bird acoustic analyses more feasible for researchers by significantly reducing execution times, though it is incremental as it combines existing methods.
The paper tackled the problem of efficiently preprocessing high-volume bird acoustic data by developing a distributed computing pipeline, achieving a 21.76 times speedup with 32 cores over 8 virtual machines compared to serial processing.
In this work, we examine the problem of efficiently preprocessing high volume bird acoustic data. We combine several existing preprocessing steps including noise reduction approaches into a single efficient pipeline by examining each process individually. We then utilise a distributed computing architecture to improve execution time. Using a master-slave model with data parallelisation, we developed a near-linear automated scalable system, capable of preprocessing bird acoustic recordings 21.76 times faster with 32 cores over 8 virtual machines, compared to a serial process. This work contributes to the research area of bioacoustic analysis, which is currently very active because of its potential to monitor animals quickly at low cost. Overcoming noise interference is a significant challenge in many bioacoustic studies, and the volume of data in these studies is increasing. Our work makes large scale bird acoustic analyses more feasible by parallelising important bird acoustic processing tasks to significantly reduce execution times.