CVMar 8, 2024Code
Beyond MOT: Semantic Multi-Object TrackingYunhao Li, Qin Li, Hao Wang et al.
Current multi-object tracking (MOT) aims to predict trajectories of targets (i.e., ''where'') in videos. Yet, knowing merely ''where'' is insufficient in many crucial applications. In comparison, semantic understanding such as fine-grained behaviors, interactions, and overall summarized captions (i.e., ''what'') from videos, associated with ''where'', is highly-desired for comprehensive video analysis. Thus motivated, we introduce Semantic Multi-Object Tracking (SMOT), that aims to estimate object trajectories and meanwhile understand semantic details of associated trajectories including instance captions, instance interactions, and overall video captions, integrating ''where'' and ''what'' for tracking. In order to foster the exploration of SMOT, we propose BenSMOT, a large-scale Benchmark for Semantic MOT. Specifically, BenSMOT comprises 3,292 videos with 151K frames, covering various scenarios for semantic tracking of humans. BenSMOT provides annotations for the trajectories of targets, along with associated instance captions in natural language, instance interactions, and overall caption for each video sequence. To our best knowledge, BenSMOT is the first publicly available benchmark for SMOT. Besides, to encourage future research, we present a novel tracker named SMOTer, which is specially designed and end-to-end trained for SMOT, showing promising performance. By releasing BenSMOT, we expect to go beyond conventional MOT by predicting ''where'' and ''what'' for SMOT, opening up a new direction in tracking for video understanding. We will release BenSMOT and SMOTer at https://github.com/Nathan-Li123/SMOTer.
SYNov 12, 2023
TSViT: A Time Series Vision Transformer for Fault DiagnosisShouhua Zhang, Jiehan Zhou, Xue Ma et al.
Traditional fault diagnosis methods using Convolutional Neural Networks (CNNs) often struggle with capturing the temporal dynamics of vibration signals. To overcome this, the application of Transformer-based Vision Transformer (ViT) methods to fault diagnosis is gaining attraction. Nonetheless, these methods typically require extensive preprocessing, which increases computational complexity, potentially reducing the efficiency of the diagnosis process. Addressing this gap, this paper presents the Time Series Vision Transformer (TSViT), tailored for effective fault diagnosis. TSViT incorporates a convolutional layer to extract local features from vibration signals, alongside a transformer encoder to discern long-term temporal patterns. A thorough experimental comparison on three diverse datasets demonstrates TSViT's effectiveness and adaptability. Moreover, the paper delves into the influence of hyperparameter tuning on the model's performance, computational demand, and parameter count. Remarkably, TSViT achieves an unprecedented 100% average accuracy on two test sets and 99.99% on another, showcasing its exceptional diagnostic capabilities.
LGJul 30, 2024
Robust Load Prediction of Power Network Clusters Based on Cloud-Model-Improved TransformerCheng Jiang, Gang Lu, Xue Ma et al.
Load data from power network clusters indicates economic development in each area, crucial for predicting regional trends and guiding power enterprise decisions. The Transformer model, a leading method for load prediction, faces challenges modeling historical data due to variables like weather, events, festivals, and data volatility. To tackle this, the cloud model's fuzzy feature is utilized to manage uncertainties effectively. Presenting an innovative approach, the Cloud Model Improved Transformer (CMIT) method integrates the Transformer model with the cloud model utilizing the particle swarm optimization algorithm, with the aim of achieving robust and precise power load predictions. Through comparative experiments conducted on 31 real datasets within a power network cluster, it is demonstrated that CMIT significantly surpasses the Transformer model in terms of prediction accuracy, thereby highlighting its effectiveness in enhancing forecasting capabilities within the power network cluster sector.