CVLGIVNov 9, 2019

A Proposed Artificial intelligence Model for Real-Time Human Action Localization and Tracking

arXiv:1911.04469v1
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of high-accuracy real-time human action detection for applications in smart cities and IoT systems, but it is incremental as it combines existing methods.

The paper tackled real-time human action localization and tracking by integrating YOLO, motion vectors, and the Coyote Optimization Algorithm, resulting in a system that reduces computational resource consumption compared to methods like optical flows, making it suitable for limited-resource environments like IoT systems.

In recent years, artificial intelligence (AI) based on deep learning (DL) has sparked tremendous global interest. DL is widely used today and has expanded into various interesting areas. It is becoming more popular in cross-subject research, such as studies of smart city systems, which combine computer science with engineering applications. Human action detection is one of these areas. Human action detection is an interesting challenge due to its stringent requirements in terms of computing speed and accuracy. High-accuracy real-time object tracking is also considered a significant challenge. This paper integrates the YOLO detection network, which is considered a state-of-the-art tool for real-time object detection, with motion vectors and the Coyote Optimization Algorithm (COA) to construct a real-time human action localization and tracking system. The proposed system starts with the extraction of motion information from a compressed video stream and the extraction of appearance information from RGB frames using an object detector. Then, a fusion step between the two streams is performed, and the results are fed into the proposed action tracking model. The COA is used in object tracking due to its accuracy and fast convergence. The basic foundation of the proposed model is the utilization of motion vectors, which already exist in a compressed video bit stream and provide sufficient information to improve the localization of the target action without requiring high consumption of computational resources compared with other popular methods of extracting motion information, such as optical flows. This advantage allows the proposed approach to be implemented in challenging environments where the computational resources are limited, such as Internet of Things (IoT) systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes