Camera clustering for scalable stream-based active distillation
This work addresses scalability in video surveillance or multi-camera systems by offering an incremental improvement in model efficiency and accuracy.
The paper tackles the problem of efficiently training lightweight video object detection models by proposing a camera clustering method to reduce the number of models needed and enhance the distillation dataset, resulting in improved accuracy compared to per-camera or universal model approaches.
We present a scalable framework designed to craft efficient lightweight models for video object detection utilizing self-training and knowledge distillation techniques. We scrutinize methodologies for the ideal selection of training images from video streams and the efficacy of model sharing across numerous cameras. By advocating for a camera clustering methodology, we aim to diminish the requisite number of models for training while augmenting the distillation dataset. The findings affirm that proper camera clustering notably amplifies the accuracy of distilled models, eclipsing the methodologies that employ distinct models for each camera or a universal model trained on the aggregate camera data.