Yanyun Zhao

5papers

1,125citations

Novelty37%

AI Score27

Ranked #163,858 of 205,806 authors (top 80%)#49,958 in CV (top 85%)

5 Papers

CVNov 24, 2022

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Benjamin Kiefer, Matej Kristan, Janez Perš et al.

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.

CVMar 8, 2022

PAMI-AD: An Activity Detector Exploiting Part-attention and Motion Information in Surveillance Videos

Yunhao Du, Zhihang Tong, Junfeng Wan et al.

Activity detection in surveillance videos is a challenging task caused by small objects, complex activity categories, its untrimmed nature, etc. Existing methods are generally limited in performance due to inaccurate proposals, poor classifiers or inadequate post-processing method. In this work, we propose a comprehensive and effective activity detection system in untrimmed surveillance videos for person-centered and vehicle-centered activities. It consists of four modules, i.e., object localizer, proposal filter, activity classifier and activity refiner. For person-centered activities, a novel part-attention mechanism is proposed to explore detailed features in different body parts. As for vehicle-centered activities, we propose a localization masking method to jointly encode motion and foreground attention features. We conduct experiments on the large-scale activity detection datasets VIRAT, and achieve the best results for both groups of activities. Furthermore, our team won the 1st place in the TRECVID 2021 ActEV challenge.

CVFeb 28, 2022Code

StrongSORT: Make DeepSORT Great Again

Yunhao Du, Zhicheng Zhao, Yang Song et al.

Recently, Multi-Object Tracking (MOT) has attracted rising attention, and accordingly, remarkable progresses have been achieved. However, the existing methods tend to use various basic models (e.g, detector and embedding model), and different training or inference tricks, etc. As a result, the construction of a good baseline for a fair comparison is essential. In this paper, a classic tracker, i.e., DeepSORT, is first revisited, and then is significantly improved from multiple perspectives such as object detection, feature embedding, and trajectory association. The proposed tracker, named StrongSORT, contributes a strong and fair baseline for the MOT community. Moreover, two lightweight and plug-and-play algorithms are proposed to address two inherent "missing" problems of MOT: missing association and missing detection. Specifically, unlike most methods, which associate short tracklets into complete trajectories at high computation complexity, we propose an appearance-free link model (AFLink) to perform global association without appearance information, and achieve a good balance between speed and accuracy. Furthermore, we propose a Gaussian-smoothed interpolation (GSI) based on Gaussian process regression to relieve the missing detection. AFLink and GSI can be easily plugged into various trackers with a negligible extra computational cost (1.7 ms and 7.1 ms per image, respectively, on MOT17). Finally, by fusing StrongSORT with AFLink and GSI, the final tracker (StrongSORT++) achieves state-of-the-art results on multiple public benchmarks, i.e., MOT17, MOT20, DanceTrack and KITTI. Codes are available at https://github.com/dyhBUPT/StrongSORT and https://github.com/open-mmlab/mmtracking.

CVFeb 24, 2022

GIAOTracker: A comprehensive framework for MCMOT with global information and optimizing strategies in VisDrone 2021

Yunhao Du, Junfeng Wan, Yanyun Zhao et al.

In recent years, algorithms for multiple object tracking tasks have benefited from great progresses in deep models and video quality. However, in challenging scenarios like drone videos, they still suffer from problems, such as small objects, camera movements and view changes. In this paper, we propose a new multiple object tracker, which employs Global Information And some Optimizing strategies, named GIAOTracker. It consists of three stages, i.e., online tracking, global link and post-processing. Given detections in every frame, the first stage generates reliable tracklets using information of camera motion, object motion and object appearance. Then they are associated into trajectories by exploiting global clues and refined through four post-processing methods. With the effectiveness of the three stages, GIAOTracker achieves state-of-the-art performance on the VisDrone MOT dataset and wins the 3rd place in the VisDrone2021 MOT Challenge.

CVJul 19, 2021

VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Dawei Du, Longyin Wen, Pengfei Zhu et al.

Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint. However, there are few algorithms focusing on crowd counting on the drone-captured data due to the lack of comprehensive datasets. To this end, we collect a large-scale dataset and organize the Vision Meets Drone Crowd Counting Challenge (VisDrone-CC2020) in conjunction with the 16th European Conference on Computer Vision (ECCV 2020) to promote the developments in the related fields. The collected dataset is formed by $3,360$ images, including $2,460$ images for training, and $900$ images for testing. Specifically, we manually annotate persons with points in each video frame. There are $14$ algorithms from $15$ institutes submitted to the VisDrone-CC2020 Challenge. We provide a detailed analysis of the evaluation results and conclude the challenge. More information can be found at the website: \url{http://www.aiskyeye.com/}.