Deep Multimodality Learning for UAV Video Aesthetic Quality Assessment
This work addresses the lack of aesthetic assessment tools for aerial videos, which can help improve UAV photography for professionals and amateurs, though it is incremental in applying deep learning to a new domain.
The authors tackled the problem of assessing aesthetic quality in UAV videos by developing a deep multimodality learning method that exploits spatial appearance, drone camera motion, and scene structure, achieving results that outperform existing video classification and SVM-based methods.
Despite the growing number of unmanned aerial vehicles (UAVs) and aerial videos, there is a paucity of studies focusing on the aesthetics of aerial videos that can provide valuable information for improving the aesthetic quality of aerial photography. In this article, we present a method of deep multimodality learning for UAV video aesthetic quality assessment. More specifically, a multistream framework is designed to exploit aesthetic attributes from multiple modalities, including spatial appearance, drone camera motion, and scene structure. A novel specially designed motion stream network is proposed for this new multistream framework. We construct a dataset with 6,000 UAV video shots captured by drone cameras. Our model can judge whether a UAV video was shot by professional photographers or amateurs together with the scene type classification. The experimental results reveal that our method outperforms the video classification methods and traditional SVM-based methods for video aesthetics. In addition, we present three application examples of UAV video grading, professional segment detection and aesthetic-based UAV path planning using the proposed method.