CVLGOct 8, 2021

COVID-19 Monitoring System using Social Distancing and Face Mask Detection on Surveillance video datasets

arXiv:2110.03905v341 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for efficient, automated monitoring of COVID-19 protocols in public spaces, though it is incremental as it combines existing methods.

The paper tackles the problem of automating COVID-19 safety monitoring by developing a system that detects social distancing violations and face mask usage in surveillance videos, achieving 91.2% accuracy and 90.79% F1 score with an average prediction time of 7.12 seconds per 78 frames.

In the current times, the fear and danger of COVID-19 virus still stands large. Manual monitoring of social distancing norms is impractical with a large population moving about and with insufficient task force and resources to administer them. There is a need for a lightweight, robust and 24X7 video-monitoring system that automates this process. This paper proposes a comprehensive and effective solution to perform person detection, social distancing violation detection, face detection and face mask classification using object detection, clustering and Convolution Neural Network (CNN) based binary classifier. For this, YOLOv3, Density-based spatial clustering of applications with noise (DBSCAN), Dual Shot Face Detector (DSFD) and MobileNetV2 based binary classifier have been employed on surveillance video datasets. This paper also provides a comparative study of different face detection and face mask classification models. Finally, a video dataset labelling method is proposed along with the labelled video dataset to compensate for the lack of dataset in the community and is used for evaluation of the system. The system performance is evaluated in terms of accuracy, F1 score as well as the prediction time, which has to be low for practical applicability. The system performs with an accuracy of 91.2% and F1 score of 90.79% on the labelled video dataset and has an average prediction time of 7.12 seconds for 78 frames of a video.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes