CV AIApr 22

PipeMFL-240K: A Large-scale Dataset and Benchmark for Object Detection in Pipeline Magnetic Flux Leakage Imaging

Tianyi Qu, Songxiao Yang, Haolin Wang, Huadong Song, Xiaoting Guo, Wenguang Hu, Guanlin Liu, Honghe Chen, Yafei Ou

arXiv:2602.070441.6h-index: 7

AI Analysis

This provides a critical foundation for improving pipeline integrity assessment through reproducible research, though it is incremental as it primarily offers a new dataset rather than a novel method.

The authors tackled the lack of a large-scale public dataset for object detection in pipeline Magnetic Flux Leakage (MFL) imaging by introducing PipeMFL-240K, a dataset with 249,320 images and 200,020 annotations, and found that state-of-the-art detectors still struggle with its challenges, indicating significant room for improvement.

Pipeline integrity is critical to industrial safety and environmental protection, with Magnetic Flux Leakage (MFL) detection being a primary non-destructive testing technology. Despite the promise of deep learning for automating MFL interpretation, progress toward reliable models has been constrained by the absence of a large-scale public dataset and benchmark, making fair comparison and reproducible evaluation difficult. We introduce \textbf{PipeMFL-240K}, a large-scale, meticulously annotated dataset and benchmark for complex object detection in pipeline MFL pseudo-color images. PipeMFL-240K reflects real-world inspection complexity and poses several unique challenges: (i) an extremely long-tailed distribution over \textbf{12} categories, (ii) a high prevalence of tiny objects that often comprise only a handful of pixels and (iii) substantial intra-class variability. The dataset contains \textbf{249,320} images and \textbf{200,020} high-quality bounding-box annotations, collected from 12 pipelines spanning approximately \textbf{1,530} km. Extensive experiments are conducted with state-of-the-art object detectors to establish baselines. Results show that modern detectors still struggle with the intrinsic properties of MFL data, highlighting considerable headroom for improvement, while PipeMFL-240K provides a reliable and challenging testbed to drive future research. As the first public dataset and the first benchmark of this scale and scope for pipeline MFL inspection, it provides a critical foundation for efficient pipeline diagnostics as well as maintenance planning and is expected to accelerate algorithmic innovation and reproducible research in MFL-based pipeline integrity assessment.

View on arXiv PDF

Similar