Yongxuan Lyu

h-index3
2papers

2 Papers

97.6AIMay 7
MolRecBench-Wild: A Real-World Benchmark for Optical Chemical Structure Recognition

Haote Yang, Hui Wang, Chen Zhu et al.

Optical Chemical Structure Recognition (OCSR) aims to translate molecular diagrams in scientific literature into machine-readable formats, but current systems remain unreliable on real-world images due to substantial visual and chemical complexity. We introduce MOSAIC, a dual-dimensional difficulty framework with 37 fine-grained labels that jointly characterize visual interference and chemical semantic challenges in molecular diagrams. Based on this framework, we construct MolRecBench-Wild, a benchmark of 5,029 structures from 820 recent chemistry papers, covering the full difficulty spectrum observed in real publications. To enable faithful semantic evaluation beyond SMILES and MolFile, we propose CARBON, a representation language capable of expressing valence variations, icon-based groups, and other non-standard chemical semantics. We further adopt a dual-track evaluation protocol supporting both CARBON and SMILES outputs for broad model compatibility. Comprehensive experiments over 18 OCSR-capable models reveal severe performance degradation on MolRecBench-Wild, exposing a large gap between previous patent benchmarks and real-world academic scenarios.

CVOct 7, 2025
ALISE: Annotation-Free LiDAR Instance Segmentation for Autonomous Driving

Yongxuan Lyu, Guangfeng Jiang, Hongsi Liu et al.

The manual annotation of outdoor LiDAR point clouds for instance segmentation is extremely costly and time-consuming. Current methods attempt to reduce this burden but still rely on some form of human labeling. To completely eliminate this dependency, we introduce ALISE, a novel framework that performs LiDAR instance segmentation without any annotations. The central challenge is to generate high-quality pseudo-labels in a fully unsupervised manner. Our approach starts by employing Vision Foundation Models (VFMs), guided by text and images, to produce initial pseudo-labels. We then refine these labels through a dedicated spatio-temporal voting module, which combines 2D and 3D semantics for both offline and online optimization. To achieve superior feature learning, we further introduce two forms of semantic supervision: a set of 2D prior-based losses that inject visual knowledge into the 3D network, and a novel prototype-based contrastive loss that builds a discriminative feature space by exploiting 3D semantic consistency. This comprehensive design results in significant performance gains, establishing a new state-of-the-art for unsupervised 3D instance segmentation. Remarkably, our approach even outperforms MWSIS, a method that operates with supervision from ground-truth (GT) 2D bounding boxes by a margin of 2.53% in mAP (50.95% vs. 48.42%).