CVJan 5, 2023
Flying Bird Object Detection Algorithm in Surveillance Video Based on Motion InformationZiwei Sun, Zexi Hua, Hengcao Li et al.
A Flying Bird Object Detection algorithm Based on Motion Information (FBOD-BMI) is proposed to solve the problem that the features of the object are not obvious in a single frame, and the size of the object is small (low Signal-to-Noise Ratio (SNR)) in surveillance video. Firstly, a ConvLSTM-PAN model structure is designed to capture suspicious flying bird objects, in which the Convolutional Long and Short Time Memory (ConvLSTM) network aggregated the Spatio-temporal features of the flying bird object on adjacent multi-frame before the input of the model and the Path Aggregation Network (PAN) located the suspicious flying bird objects. Then, an object tracking algorithm is used to track suspicious flying bird objects and calculate their Motion Range (MR). At the same time, the size of the MR of the suspicious flying bird object is adjusted adaptively according to its speed of movement (specifically, if the bird moves slowly, its MR will be expanded according to the speed of the bird to ensure the environmental information needed to detect the flying bird object). Adaptive Spatio-temporal Cubes (ASt-Cubes) of the flying bird objects are generated to ensure that the SNR of the flying bird objects is improved, and the necessary environmental information is retained adaptively. Finally, a LightWeight U-Shape Net (LW-USN) based on ASt-Cubes is designed to detect flying bird objects, which rejects the false detections of the suspicious flying bird objects and returns the position of the real flying bird objects. The monitoring video including the flying birds is collected in the unattended traction substation as the experimental dataset to verify the performance of the algorithm. The experimental results show that the flying bird object detection method based on motion information proposed in this paper can effectively detect the flying bird object in surveillance video.
CVJan 8, 2024
A Flying Bird Object Detection Method for Surveillance VideoZiwei Sun, Zexi Hua, Hengchao Li et al.
Aiming at the specific characteristics of flying bird objects in surveillance video, such as the typically non-obvious features in single-frame images, small size in most instances, and asymmetric shapes, this paper proposes a Flying Bird Object Detection method for Surveillance Video (FBOD-SV). Firstly, a new feature aggregation module, the Correlation Attention Feature Aggregation (Co-Attention-FA) module, is designed to aggregate the features of the flying bird object according to the bird object's correlation on multiple consecutive frames of images. Secondly, a Flying Bird Object Detection Network (FBOD-Net) with down-sampling followed by up-sampling is designed, which utilizes a large feature layer that fuses fine spatial information and large receptive field information to detect special multi-scale (mostly small-scale) bird objects. Finally, the SimOTA dynamic label allocation method is applied to One-Category object detection, and the SimOTA-OC dynamic label strategy is proposed to solve the difficult problem of label allocation caused by irregular flying bird objects. In this paper, the performance of the FBOD-SV is validated using experimental datasets of flying bird objects in traction substation surveillance videos. The experimental results show that the FBOD-SV effectively improves the detection performance of flying bird objects in surveillance video.
CVMar 5, 2025
DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba PictogramsXiaojun Bi, Shuo Li, Junyao Xing et al.
Dongba pictographic is the only pictographic script still in use in the world. Its pictorial ideographic features carry rich cultural and contextual information. However, due to the lack of relevant datasets, research on semantic understanding of Dongba hieroglyphs has progressed slowly. To this end, we constructed \textbf{DongbaMIE} - the first dataset focusing on multimodal information extraction of Dongba pictographs. The dataset consists of images of Dongba hieroglyphic characters and their corresponding semantic annotations in Chinese. It contains 23,530 sentence-level and 2,539 paragraph-level high-quality text-image pairs. The annotations cover four semantic dimensions: object, action, relation and attribute. Systematic evaluation of mainstream multimodal large language models shows that the models are difficult to perform information extraction of Dongba hieroglyphs efficiently under zero-shot and few-shot learning. Although supervised fine-tuning can improve the performance, accurate extraction of complex semantics is still a great challenge at present.