CVJun 3
UniCAD: A Unified Benchmark and Universal Model for Multi-Modal Multi-Task CADJingyuan Chen, Sheng Jin, Haopeng Sun et al.
Computer-Aided Design (CAD) underpins modern engineering and manufacturing by enabling the creation of precise, editable 3D models. However, CAD research typically studies tasks in isolation, and multi-modal, multi-task learning for CAD is hindered by the absence of a unified benchmark. To address this gap, we introduce UniCAD, a comprehensive benchmark for multi-modal CAD learning that covers point-to-CAD reconstruction, text/image-to-CAD generation, and CAD question answering across diverse input modalities. Alongside the benchmark, we present UniCAD-MLLM, a universal multi-modal large language model that ingests text, images, sketches, and point clouds and performs these heterogeneous tasks in an end-to-end fashion within a single framework. Extensive experiments on the UniCAD and Fusion360 benchmarks demonstrate that UniCAD-MLLM achieves state-of-the-art performance across all tasks, outperforming existing task-specific and multi-task baselines. We will release the dataset, code, and pretrained models to accelerate future research.
CVDec 13, 2024
Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging TransformerHaopeng Sun, Yingwei Zhang, Lumin Xu et al.
Segmentation of ultra-high resolution (UHR) images is a critical task with numerous applications, yet it poses significant challenges due to high spatial resolution and rich fine details. Recent approaches adopt a dual-branch architecture, where a global branch learns long-range contextual information and a local branch captures fine details. However, they struggle to handle the conflict between global and local information while adding significant extra computational cost. Inspired by the human visual system's ability to rapidly orient attention to important areas with fine details and filter out irrelevant information, we propose a novel UHR segmentation method called Boundary-enhanced Patch-merging Transformer (BPT). BPT consists of two key components: (1) Patch-Merging Transformer (PMT) for dynamically allocating tokens to informative regions to acquire global and local representations, and (2) Boundary-Enhanced Module (BEM) that leverages boundary information to enrich fine details. Extensive experiments on multiple UHR image segmentation benchmarks demonstrate that our BPT outperforms previous state-of-the-art methods without introducing extra computational overhead. Codes will be released to facilitate research.
CVDec 13, 2024
SPT: Sequence Prompt Transformer for Interactive Image SegmentationSenlin Cheng, Haopeng Sun
Interactive segmentation aims to extract objects of interest from an image based on user-provided clicks. In real-world applications, there is often a need to segment a series of images featuring the same target object. However, existing methods typically process one image at a time, failing to consider the sequential nature of the images. To overcome this limitation, we propose a novel method called Sequence Prompt Transformer (SPT), the first to utilize sequential image information for interactive segmentation. Our model comprises two key components: (1) Sequence Prompt Transformer (SPT) for acquiring information from sequence of images, clicks and masks to improve accurate. (2) Top-k Prompt Selection (TPS) selects precise prompts for SPT to further enhance the segmentation effect. Additionally, we create the ADE20K-Seq benchmark to better evaluate model performance. We evaluate our approach on multiple benchmark datasets and show that our model surpasses state-of-the-art methods across all datasets.
DMJun 9, 2025
HyColor: An Efficient Heuristic Algorithm for Graph ColoringEnqiang Zhu, Yu Zhang, Haopeng Sun et al.
The graph coloring problem (GCP) is a classic combinatorial optimization problem that aims to find the minimum number of colors assigned to vertices of a graph such that no two adjacent vertices receive the same color. GCP has been extensively studied by researchers from various fields, including mathematics, computer science, and biological science. Due to the NP-hard nature, many heuristic algorithms have been proposed to solve GCP. However, existing GCP algorithms focus on either small hard graphs or large-scale sparse graphs (with up to 10^7 vertices). This paper presents an efficient hybrid heuristic algorithm for GCP, named HyColor, which excels in handling large-scale sparse graphs while achieving impressive results on small dense graphs. The efficiency of HyColor comes from the following three aspects: a local decision strategy to improve the lower bound on the chromatic number; a graph-reduction strategy to reduce the working graph; and a k-core and mixed degree-based greedy heuristic for efficiently coloring graphs. HyColor is evaluated against three state-of-the-art GCP algorithms across four benchmarks, comprising three large-scale sparse graph benchmarks and one small dense graph benchmark, totaling 209 instances. The results demonstrate that HyColor consistently outperforms existing heuristic algorithms in both solution accuracy and computational efficiency for the majority of instances. Notably, HyColor achieved the best solutions in 194 instances (over 93%), with 34 of these solutions significantly surpassing those of other algorithms. Furthermore, HyColor successfully determined the chromatic number and achieved optimal coloring in 128 instances.