CEJul 8, 2024
Parsimonious Universal Function Approximator for Elastic and Elasto-Plastic Cavity Expansion ProblemsXiao-Xuan Chen, Pin Zhang, Hai-Sui Yu et al.
Cavity expansion is a canonical problem in geotechnics, which can be described by partial differential equations (PDEs) and ordinary differential equations (ODEs). This study explores the potential of using a new solver, a physics-informed neural network (PINN), to calculate the stress field in an expanded cavity in the elastic and elasto-plastic regimes. Whilst PINNs have emerged as an effective universal function approximator for deriving the solutions of a wide range of governing PDEs/ODEs, their ability to solve elasto-plastic problems remains uncertain. A novel parsimonious loss function is first proposed to balance the simplicity and accuracy of PINN. The proposed method is applied to diverse material behaviours in the cavity expansion problem including isotropic, anisotropic elastic media, and elastic-perfectly plastic media with Tresca and Mohr-Coulomb yield criteria. The results indicate that the use of a parsimonious prior information-based loss function is highly beneficial to deriving the approximate solutions of complex PDEs with high accuracy. The present method allows for accurate derivation of solutions for both elastic and plastic mechanical responses of an expanded cavity. It also provides insights into how PINNs can be further advanced to solve more complex problems in geotechnical practice.
80.1ROMay 17
RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic ManipulationSixu Lin, Junliang Chen, Huaiyuan Xu et al.
Planning and acting in 3D environments is a fundamental capability for robotic manipulation in the real world. Although prior work has explored predictive flow planners to guide 3D manipulation, existing approaches often rely on modular pipelines stacking multiple submodels, resulting in high computational overhead and limited real-time performance. To address these challenges, we introduce RoboFlow4D, a lightweight flow world model that unifies perception and planning by estimating temporal motion in physical 3D space. As an end-to-end framework, RoboFlow4D directly predicts multi-frame 3D flows from visual observations and textual instructions, providing explicit flow-based planning to guide action generation. This design allows seamless integration with general action policies, forming an efficient observation-planning-execution closed loop. Through slow-fast collaboration between flow prediction and action control, RoboFlow4D enables real-time and resource-efficient manipulation. Extensive experiments in both simulation and real-world settings demonstrate that RoboFlow4D consistently improves manipulation success rates and computational efficiency, advancing flow-guided planning for embodied intelligence.
CVFeb 9, 2025Code
A 3D Multimodal Feature for Infrastructure Anomaly DetectionYixiong Jing, Wei Lin, Brian Sheil et al.
Ageing structures require periodic inspections to identify structural defects. Previous work has used geometric distortions to locate cracks in synthetic masonry bridge point clouds but has struggled to detect small cracks. To address this limitation, this study proposes a novel 3D multimodal feature, 3DMulti-FPFHI, that combines a customized Fast Point Feature Histogram (FPFH) with an intensity feature. This feature is integrated into the PatchCore anomaly detection algorithm and evaluated through statistical and parametric analyses. The method is further evaluated using point clouds of a real masonry arch bridge and a full-scale experimental model of a concrete tunnel. Results show that the 3D intensity feature enhances inspection quality by improving crack detection; it also enables the identification of water ingress which introduces intensity anomalies. The 3DMulti-FPFHI outperforms FPFH and a state-of-the-art multimodal anomaly detection method. The potential of the method to address diverse infrastructure anomaly detection scenarios is highlighted by the minimal requirements for data compared to learning-based methods. The code and related point cloud dataset are available at https://github.com/Jingyixiong/3D-Multi-FPFHI.
CVSep 3, 2025Code
InfraDiffusion: zero-shot depth map restoration with diffusion models and prompted segmentation from sparse infrastructure point cloudsYixiong Jing, Cheng Zhang, Haibing Wu et al.
Point clouds are widely used for infrastructure monitoring by providing geometric information, where segmentation is required for downstream tasks such as defect detection. Existing research has automated semantic segmentation of structural components, while brick-level segmentation (identifying defects such as spalling and mortar loss) has been primarily conducted from RGB images. However, acquiring high-resolution images is impractical in low-light environments like masonry tunnels. Point clouds, though robust to dim lighting, are typically unstructured, sparse, and noisy, limiting fine-grained segmentation. We present InfraDiffusion, a zero-shot framework that projects masonry point clouds into depth maps using virtual cameras and restores them by adapting the Denoising Diffusion Null-space Model (DDNM). Without task-specific training, InfraDiffusion enhances visual clarity and geometric consistency of depth maps. Experiments on masonry bridge and tunnel point cloud datasets show significant improvements in brick-level segmentation using the Segment Anything Model (SAM), underscoring its potential for automated inspection of masonry assets. Our code and data is available at https://github.com/Jingyixiong/InfraDiffusion-official-implement.
54.6ROMay 8
PhySPRING: Structure-Preserving Reduction of Physics-Informed Twins via GNNYixiong Jing, Xingyuan Chen, Guangming Wang et al.
Physics-based digital twins aim to predict the dynamics of real-world objects under interaction, enabling real-to-sim-to-real applications in robotics. Current approaches reconstruct such twins as explicit physical models (such as spring-mass systems) to predict the dynamics, but the resulting models often inherit the resolution of the visual reconstruction rather than being reduced to the physical complexity required to reproduce task-relevant dynamics. This mismatch introduces redundant topology, making repeated forward-dynamics rollouts unnecessarily expensive. To address this challenge, we present PhySPRING, an fully differentiable GNN-based method to reduce complexity in spring--mass digital twins. PhySPRING jointly learns a hierarchy of coarsened graph topologies and their mechanical parameters from observations. At each reduction level, PhySPRING merges nodes with similar learned dynamic responses to optimize the topology, while maintaining every reduced layer as an explicit spring--mass system. On the PhysTwin benchmark, PhySPRING improves dense reconstruction and prediction accuracy over PhysTwin, while reduced models retain stable physical and visual fidelity with up to a 2.30 times speed-up. We further demonstrate the effectiveness of PhySPRING in a Real2Sim robot policy-evaluation pipeline, where the reduced models are substituted zero-shot into ACT and $π_0$ evaluations, maintaining comparable manipulation success rates across downsampling levels while improving action-sampling effectiveness. Together, PhySPRING enables efficient and structure-preserving spring--mass reduction without sacrificing fidelity or robotic utility.
RONov 22, 2025
Continually Evolving Skill Knowledge in Vision Language Action ModelYuxuan Wu, Guangming Wang, Zhiheng Yang et al.
Developing general robot intelligence in open environments requires continual skill learning. Recent Vision-Language-Action (VLA) models leverage massive pretraining data to support diverse manipulation tasks, but they still depend heavily on task-specific fine-tuning, revealing a lack of continual learning capability. Existing continual learning methods are also resource-intensive to scale to VLA models. We propose Stellar VLA, a knowledge-driven continual learning framework with two variants: T-Stellar, modeling task-centric knowledge space, and TS-Stellar, capturing hierarchical task-skill structure. Stellar VLA enables self-supervised knowledge evolution through joint learning of task latent representation and the knowledge space, reducing annotation needs. Knowledge-guided expert routing provide task specialization without extra network parameters, lowering training overhead. Experiments on the LIBERO benchmark and real-world tasks show over 50 percentage average improvement in final success rates relative to baselines. TS-Stellar further excels in complex action inference, and in-depth analyses verify effective knowledge retention and discovery. Our code will be released soon.