IVNov 30, 2024
Energy-Based Prior Latent Space Diffusion model for Reconstruction of Lumbar Vertebrae from Thick Slice MRIYanke Wang, Yolanne Y. R. Lee, Aurelio Dolfini et al.
Lumbar spine problems are ubiquitous, motivating research into targeted imaging for treatment planning and guided interventions. While high resolution and high contrast CT has been the modality of choice, MRI can capture both bone and soft tissue without the ionizing radiation of CT albeit longer acquisition time. The critical trade-off between contrast quality and acquisition time has motivated 'thick slice MRI', which prioritises faster imaging with high in-plane resolution but variable contrast and low through-plane resolution. We investigate a recently developed post-acquisition pipeline which segments vertebrae from thick-slice acquisitions and uses a variational autoencoder to enhance quality after an initial 3D reconstruction. We instead propose a latent space diffusion energy-based prior to leverage diffusion models, which exhibit high-quality image generation. Crucially, we mitigate their high computational cost and low sample efficiency by learning an energy-based latent representation to perform the diffusion processes. Our resulting method outperforms existing approaches across metrics including Dice and VS scores, and more faithfully captures 3D features.
LGSep 20, 2025
Hyperbolic OptimizationYanke Wang, Kyriakos Flouris
This work explores optimization methods on hyperbolic manifolds. Building on Riemannian optimization principles, we extend the Hyperbolic Stochastic Gradient Descent (a specialization of Riemannian SGD) to a Hyperbolic Adam optimizer. While these methods are particularly relevant for learning on the Poincaré ball, they may also provide benefits in Euclidean and other non-Euclidean settings, as the chosen optimization encourages the learning of Poincaré embeddings. This representation, in turn, accelerates convergence in the early stages of training, when parameters are far from the optimum. As a case study, we train diffusion models using the hyperbolic optimization methods with hyperbolic time-discretization of the Langevin dynamics, and show that they achieve faster convergence on certain datasets without sacrificing generative quality.
ROSep 20, 2025
ReSeFlow: Rectifying SE(3)-Equivariant Policy Learning FlowsZhitao Wang, Yanke Wang, Jiangtao Wen et al.
Robotic manipulation in unstructured environments requires the generation of robust and long-horizon trajectory-level policy with conditions of perceptual observations and benefits from the advantages of SE(3)-equivariant diffusion models that are data-efficient. However, these models suffer from the inference time costs. Inspired by the inference efficiency of rectified flows, we introduce the rectification to the SE(3)-diffusion models and propose the ReSeFlow, i.e., Rectifying SE(3)-Equivariant Policy Learning Flows, providing fast, geodesic-consistent, least-computational policy generation. Crucially, both components employ SE(3)-equivariant networks to preserve rotational and translational symmetry, enabling robust generalization under rigid-body motions. With the verification on the simulated benchmarks, we find that the proposed ReSeFlow with only one inference step can achieve better performance with lower geodesic distance than the baseline methods, achieving up to a 48.5% error reduction on the painting task and a 21.9% reduction on the rotating triangle task compared to the baseline's 100-step inference. This method takes advantages of both SE(3) equivariance and rectified flow and puts it forward for the real-world application of generative policy learning models with the data and inference efficiency.
ROAug 26, 2025
Hybrid Perception and Equivariant Diffusion for Robust Multi-Node Rebar TyingZhitao Wang, Yirong Xiong, Roberto Horowitz et al.
Rebar tying is a repetitive but critical task in reinforced concrete construction, typically performed manually at considerable ergonomic risk. Recent advances in robotic manipulation hold the potential to automate the tying process, yet face challenges in accurately estimating tying poses in congested rebar nodes. In this paper, we introduce a hybrid perception and motion planning approach that integrates geometry-based perception with Equivariant Denoising Diffusion on SE(3) (Diffusion-EDFs) to enable robust multi-node rebar tying with minimal training data. Our perception module utilizes density-based clustering (DBSCAN), geometry-based node feature extraction, and principal component analysis (PCA) to segment rebar bars, identify rebar nodes, and estimate orientation vectors for sequential ranking, even in complex, unstructured environments. The motion planner, based on Diffusion-EDFs, is trained on as few as 5-10 demonstrations to generate sequential end-effector poses that optimize collision avoidance and tying efficiency. The proposed system is validated on various rebar meshes, including single-layer, multi-layer, and cluttered configurations, demonstrating high success rates in node detection and accurate sequential tying. Compared with conventional approaches that rely on large datasets or extensive manual parameter tuning, our method achieves robust, efficient, and adaptable multi-node tying while significantly reducing data requirements. This result underscores the potential of hybrid perception and diffusion-driven planning to enhance automation in on-site construction tasks, improving both safety and labor efficiency.
ROAug 26, 2025
OpenTie: Open-vocabulary Sequential Rebar Tying SystemMingze Liu, Sai Fan, Haozhen Li et al.
Robotic practices on the construction site emerge as an attention-attracting manner owing to their capability of tackle complex challenges, especially in the rebar-involved scenarios. Most of existing products and research are mainly focused on flat rebar setting with model training demands. To fulfill this gap, we propose OpenTie, a 3D training-free rebar tying framework utilizing a RGB-to-point-cloud generation and an open-vocabulary detection. We implements the OpenTie via a robotic arm with a binocular camera and guarantees a high accuracy by applying the prompt-based object detection method on the image filtered by our propose post-processing procedure based a image to point cloud generation framework. The system is flexible for horizontal and vertical rebar tying tasks and the experiments on the real-world rebar setting verifies that the effectiveness of the system in practice.
CVJun 22, 2025
Bird's-eye view safety monitoring for the construction top under the tower craneYanke Wang, Yu Hin Ng, Haobo Liang et al.
The tower crane is involving more automated and intelligent operation procedure, and importantly, the application of automation technologies to the safety issues is imperative ahead of the utilization of any other advances. Among diverse risk management tasks on site, it is essential to protect the human workers on the workspace between the tower crane and constructed building top area (construction top) from the bird's-eye view, especially with Modular Integrated Construction (MiC) lifted. Also, the camera and Light Detection And Ranging (LiDAR) can capture abundant 3D information on site, which is however yet made the best use. Considering the safety protection for humans and tower cranes, we present an AI-based fully automated safety monitoring system for tower crane lifting from the bird's-eye view, surveilling to shield the human workers on the construction top and avoid cranes' collision by alarming the crane operator. The system achieved a 3D data fusion for localization of humans and MiCs by integrating the captured information from camera and LiDAR. The state-of-the-art methods were explored and implemented into our proposed software pipeline coupled with the hardware and display systems. Furthermore, we conducted an analysis of the components in the pipeline to verify the accuracy and effectiveness of the involved methods. The display and visualization on the real site proved that our system can serve as a valuable safety monitoring toolkit on site.