CVDec 17, 2025
SemanticBridge - A Dataset for 3D Semantic Segmentation of Bridges and Domain Gap AnalysisMaximilian Kellner, Mariana Ferrandon Cervantes, Yuandong Pan et al.
We propose a novel dataset that has been specifically designed for 3D semantic segmentation of bridges and the domain gap analysis caused by varying sensors. This addresses a critical need in the field of infrastructure inspection and maintenance, which is essential for modern society. The dataset comprises high-resolution 3D scans of a diverse range of bridge structures from various countries, with detailed semantic labels provided for each. Our initial objective is to facilitate accurate and automated segmentation of bridge components, thereby advancing the structural health monitoring practice. To evaluate the effectiveness of existing 3D deep learning models on this novel dataset, we conduct a comprehensive analysis of three distinct state-of-the-art architectures. Furthermore, we present data acquired through diverse sensors to quantify the domain gap resulting from sensor variations. Our findings indicate that all architectures demonstrate robust performance on the specified task. However, the domain gap can potentially lead to a decline in the performance of up to 11.4% mIoU.
CLApr 14
LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin MachinesJiechao Gao, Rohan Kumar Yadav, Yuangang Li et al.
Pretrained language models (PLMs) like BERT provide strong semantic representations but are costly and opaque, while symbolic models such as the Tsetlin Machine (TM) offer transparency but lack semantic generalization. We propose a semantic bootstrapping framework that transfers LLM knowledge into symbolic form, combining interpretability with semantic capacity. Given a class label, an LLM generates sub-intents that guide synthetic data creation through a three-stage curriculum (seed, core, enriched), expanding semantic diversity. A Non-Negated TM (NTM) learns from these examples to extract high-confidence literals as interpretable semantic cues. Injecting these cues into real data enables a TM to align clause logic with LLM-inferred semantics. Our method requires no embeddings or runtime LLM calls, yet equips symbolic models with pretrained semantic priors. Across multiple text classification tasks, it improves interpretability and accuracy over vanilla TM, achieving performance comparable to BERT while remaining fully symbolic and efficient.
CVDec 26, 2024
Impact of color and mixing proportion of synthetic point clouds on semantic segmentationShaojie Zhou, Jia-Rui Lin, Peng Pan et al.
Deep learning (DL)-based point cloud segmentation is essential for understanding built environment. Despite synthetic point clouds (SPC) having the potential to compensate for data shortage, how synthetic color and mixing proportion impact DL-based segmentation remains a long-standing question. Therefore, this paper addresses this question with extensive experiments by introducing: 1) method to generate SPC with real colors and uniform colors from BIM, and 2) enhanced benchmarks for better performance evaluation. Experiments on DL models including PointNet, PointNet++, and DGCNN show that model performance on SPC with real colors outperforms that on SPC with uniform colors by 8.2 % + on both OA and mIoU. Furthermore, a higher than 70 % mixing proportion of SPC usually leads to better performance. And SPC can replace real ones to train a DL model for detecting large and flat building elements. Overall, this paper unveils the performance-improving mechanism of SPC and brings new insights to boost SPC's value (for building large models for point clouds).