CVApr 24, 2019
LFFD: A Light and Fast Face Detector for Edge DevicesYonghao He, Dezhong Xu, Lifang Wu et al.
Face detection, as a fundamental technology for various applications, is always deployed on edge devices which have limited memory storage and low computing power. This paper introduces a Light and Fast Face Detector (LFFD) for edge devices. The proposed method is anchor-free and belongs to the one-stage category. Specifically, we rethink the importance of receptive field (RF) and effective receptive field (ERF) in the background of face detection. Essentially, the RFs of neurons in a certain layer are distributed regularly in the input image and theses RFs are natural "anchors". Combining RF "anchors" and appropriate RF strides, the proposed method can detect a large range of continuous face scales with 100% coverage in theory. The insightful understanding of relations between ERF and face scales motivates an efficient backbone for one-stage detection. The backbone is characterized by eight detection branches and common layers, resulting in efficient computation. Comprehensive and extensive experiments on popular benchmarks: WIDER FACE and FDDB are conducted. A new evaluation schema is proposed for application-oriented scenarios. Under the new schema, the proposed method can achieve superior accuracy (WIDER FACE Val/Test -- Easy: 0.910/0.896, Medium: 0.881/0.865, Hard: 0.780/0.770; FDDB -- discontinuous: 0.973, continuous: 0.724). Multiple hardware platforms are introduced to evaluate the running efficiency. The proposed method can obtain fast inference speed (NVIDIA TITAN Xp: 131.45 FPS at 640x480; NVIDIA TX2: 136.99 PFS at 160x120; Raspberry Pi 3 Model B+: 8.44 FPS at 160x120) with model size of 9 MB.
CVMar 16, 2019
Ontology Based Global and Collective Motion Patterns for Event Classification in Basketball VideosLifang Wu, Zhou Yang, Jiaoyu He et al.
In multi-person videos, especially team sport videos, a semantic event is usually represented as a confrontation between two teams of players, which can be represented as collective motion. In broadcast basketball videos, specific camera motions are used to present specific events. Therefore, a semantic event in broadcast basketball videos is closely related to both the global motion (camera motion) and the collective motion. A semantic event in basketball videos can be generally divided into three stages: pre-event, event occurrence (event-occ), and post-event. In this paper, we propose an ontology-based global and collective motion pattern (On_GCMP) algorithm for basketball event classification. First, a two-stage GCMP based event classification scheme is proposed. The GCMP is extracted using optical flow. The two-stage scheme progressively combines a five-class event classification algorithm on event-occs and a two-class event classification algorithm on pre-events. Both algorithms utilize sequential convolutional neural networks (CNNs) and long short-term memory (LSTM) networks to extract the spatial and temporal features of GCMP for event classification. Second, we utilize post-event segments to predict success/failure using deep features of images in the video frames (RGB_DF_VF) based algorithms. Finally the event classification results and success/failure classification results are integrated to obtain the final results. To evaluate the proposed scheme, we collected a new dataset called NCAA+, which is automatically obtained from the NCAA dataset by extending the fixed length of video clips forward and backward of the corresponding semantic events. The experimental results demonstrate that the proposed scheme achieves the mean average precision of 58.10% on NCAA+. It is higher by 6.50% than state-of-the-art on NCAA.