CVSep 26, 2022Code
TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video SurveillanceYajun Xu, Chuwen Huang, Yibing Nan et al.
Automatic traffic accidents detection has appealed to the machine vision community due to its implications on the development of autonomous intelligent transportation systems (ITS) and importance to traffic safety. Most previous studies on efficient analysis and prediction of traffic accidents, however, have used small-scale datasets with limited coverage, which limits their effect and applicability. Existing datasets in traffic accidents are either small-scale, not from surveillance cameras, not open-sourced, or not built for freeway scenes. Since accidents happened in freeways tend to cause serious damage and are too fast to catch the spot. An open-sourced datasets targeting on freeway traffic accidents collected from surveillance cameras is in great need and of practical importance. In order to help the vision community address these shortcomings, we endeavor to collect video data of real traffic accidents that covered abundant scenes. After integration and annotation by various dimensions, a large-scale traffic accidents dataset named TAD is proposed in this work. Various experiments on image classification, object detection, and video classification tasks, using public mainstream vision algorithms or frameworks are conducted in this work to demonstrate performance of different methods. The proposed dataset together with the experimental results are presented as a new benchmark to improve computer vision research, especially in ITS.
CVSep 10, 2025Code
MITS: A Large-Scale Multimodal Benchmark Dataset for Intelligent Traffic SurveillanceKaikai Zhao, Zhaoxiang Liu, Peng Wang et al.
General-domain large multimodal models (LMMs) have achieved significant advances in various image-text tasks. However, their performance in the Intelligent Traffic Surveillance (ITS) domain remains limited due to the absence of dedicated multimodal datasets. To address this gap, we introduce MITS (Multimodal Intelligent Traffic Surveillance), the first large-scale multimodal benchmark dataset specifically designed for ITS. MITS includes 170,400 independently collected real-world ITS images sourced from traffic surveillance cameras, annotated with eight main categories and 24 subcategories of ITS-specific objects and events under diverse environmental conditions. Additionally, through a systematic data generation pipeline, we generate high-quality image captions and 5 million instruction-following visual question-answer pairs, addressing five critical ITS tasks: object and event recognition, object counting, object localization, background analysis, and event reasoning. To demonstrate MITS's effectiveness, we fine-tune mainstream LMMs on this dataset, enabling the development of ITS-specific applications. Experimental results show that MITS significantly improves LMM performance in ITS applications, increasing LLaVA-1.5's performance from 0.494 to 0.905 (+83.2%), LLaVA-1.6's from 0.678 to 0.921 (+35.8%), Qwen2-VL's from 0.584 to 0.926 (+58.6%), and Qwen2.5-VL's from 0.732 to 0.930 (+27.0%). We release the dataset, code, and models as open-source, providing high-value resources to advance both ITS and LMM research.
CVDec 29, 2020Code
FPCC: Fast Point Cloud Clustering based Instance Segmentation for Industrial Bin-pickingYajun Xu, Shogo Arai, Diyi Liu et al.
Instance segmentation is an important pre-processing task in numerous real-world applications, such as robotics, autonomous vehicles, and human-computer interaction. Compared with the rapid development of deep learning for two-dimensional (2D) image tasks, deep learning-based instance segmentation of 3D point cloud still has a lot of room for development. In particular, distinguishing a large number of occluded objects of the same class is a highly challenging problem, which is seen in a robotic bin-picking. In a usual bin-picking scene, many identical objects are stacked together and the model of the objects is known. Thus, the semantic information can be ignored; instead, the focus in the bin-picking is put on the segmentation of instances. Based on this task requirement, we propose a Fast Point Cloud Clustering (FPCC) for instance segmentation of bin-picking scene. FPCC includes a network named FPCC-Net and a fast clustering algorithm. FPCC-net has two subnets, one for inferring the geometric centers for clustering and the other for describing features of each point. FPCC-Net extracts features of each point and infers geometric center points of each instance simultaneously. After that, the proposed clustering algorithm clusters the remaining points to the closest geometric center in feature embedding space. Experiments show that FPCC also surpasses the existing works in bin-picking scenes and is more computationally efficient. Our code and data are available at https://github.com/xyjbaal/FPCC.