52.6GRMay 10
FrameTwin: Curve-Anchored Gaussian Alignment from Sparse Views for Adaptive Wireframe 3D PrintingWenting Wang, Zhuo Huang, Kun Qian et al.
We present FrameTwin, a curve-anchored Gaussian alignment framework that uses sparse-view images to close the control loop for adaptive wireframe 3D printing. Our key idea is to capture the deformation of thin wireframe structures from sparse-view images using Gaussian kernels anchored to parametric curves, yielding a compact and geometry-aware encoding that explicitly captures strut topology. Driven by a differentiable rendering pipeline, FrameTwin estimates a neural deformation field that aligns the partially printed target model with the deformed structure observed during fabrication, where the optimized curve-Gaussian representation serves as a digital twin of the evolving wireframe. Unlike general Gaussian-splatting approaches, our formulation constrains kernel placement along parametric curves, substantially reducing the ambiguity inherent in sparse-view observations of thin structures. The resultant deformation-field alignment enforces global consistency across all struts. By using the estimated deformation field to blend the distorted printed geometry with the remaining unprinted geometry, FrameTwin enables adaptive updates to future printing trajectories. We demonstrate that FrameTwin can robustly capture and compensate for deformation in wireframe models fabricated using a robotized 3D printing system.
CVJun 5, 2025
Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusionHongyu Wang, Yonghao Long, Yueyao Chen et al.
Endoscopic Submucosal Dissection (ESD) is a well-established technique for removing epithelial lesions. Predicting dissection trajectories in ESD videos offers significant potential for enhancing surgical skill training and simplifying the learning process, yet this area remains underexplored. While imitation learning has shown promise in acquiring skills from expert demonstrations, challenges persist in handling uncertain future movements, learning geometric symmetries, and generalizing to diverse surgical scenarios. To address these, we introduce a novel approach: Implicit Diffusion Policy with Equivariant Representations for Imitation Learning (iDPOE). Our method models expert behavior through a joint state action distribution, capturing the stochastic nature of dissection trajectories and enabling robust visual representation learning across various endoscopic views. By incorporating a diffusion model into policy learning, iDPOE ensures efficient training and sampling, leading to more accurate predictions and better generalization. Additionally, we enhance the model's ability to generalize to geometric symmetries by embedding equivariance into the learning process. To address state mismatches, we develop a forward-process guided action inference strategy for conditional sampling. Using an ESD video dataset of nearly 2000 clips, experimental results show that our approach surpasses state-of-the-art methods, both explicit and implicit, in trajectory prediction. To the best of our knowledge, this is the first application of imitation learning to surgical skill development for dissection trajectory prediction.
CVOct 16, 2025
SaLon3R: Structure-aware Long-term Generalizable 3D Reconstruction from Unposed ImagesJiaxin Guo, Tongfan Guan, Wenzhen Dong et al.
Recent advances in 3D Gaussian Splatting (3DGS) have enabled generalizable, on-the-fly reconstruction of sequential input views. However, existing methods often predict per-pixel Gaussians and combine Gaussians from all views as the scene representation, leading to substantial redundancies and geometric inconsistencies in long-duration video sequences. To address this, we propose SaLon3R, a novel framework for Structure-aware, Long-term 3DGS Reconstruction. To our best knowledge, SaLon3R is the first online generalizable GS method capable of reconstructing over 50 views in over 10 FPS, with 50% to 90% redundancy removal. Our method introduces compact anchor primitives to eliminate redundancy through differentiable saliency-aware Gaussian quantization, coupled with a 3D Point Transformer that refines anchor attributes and saliency to resolve cross-frame geometric and photometric inconsistencies. Specifically, we first leverage a 3D reconstruction backbone to predict dense per-pixel Gaussians and a saliency map encoding regional geometric complexity. Redundant Gaussians are compressed into compact anchors by prioritizing high-complexity regions. The 3D Point Transformer then learns spatial structural priors in 3D space from training data to refine anchor attributes and saliency, enabling regionally adaptive Gaussian decoding for geometric fidelity. Without known camera parameters or test-time optimization, our approach effectively resolves artifacts and prunes the redundant 3DGS in a single feed-forward pass. Experiments on multiple datasets demonstrate our state-of-the-art performance on both novel view synthesis and depth estimation, demonstrating superior efficiency, robustness, and generalization ability for long-term generalizable 3D reconstruction. Project Page: https://wrld.github.io/SaLon3R/.
CVAug 23, 2025
SERES: Semantic-aware neural reconstruction from sparse viewsBo Xu, Yuhu Guo, Yuchao Wang et al.
We propose a semantic-aware neural reconstruction method to generate 3D high-fidelity models from sparse images. To tackle the challenge of severe radiance ambiguity caused by mismatched features in sparse input, we enrich neural implicit representations by adding patch-based semantic logits that are optimized together with the signed distance field and the radiance field. A novel regularization based on the geometric primitive masks is introduced to mitigate shape ambiguity. The performance of our approach has been verified in experimental evaluation. The average chamfer distances of our reconstruction on the DTU dataset can be reduced by 44% for SparseNeuS and 20% for VolRecon. When working as a plugin for those dense reconstruction baselines such as NeuS and Neuralangelo, the average error on the DTU dataset can be reduced by 69% and 68% respectively.
AIJan 19, 2021
Creation and Evaluation of a Pre-tertiary Artificial Intelligence (AI) CurriculumThomas K. F. Chiu, Helen Meng, Ching-Sing Chai et al.
Contributions: The Chinese University of Hong Kong (CUHK)-Jockey Club AI for the Future Project (AI4Future) co-created an AI curriculum for pre-tertiary education and evaluated its efficacy. While AI is conventionally taught in tertiary level education, our co-creation process successfully developed the curriculum that has been used in secondary school teaching in Hong Kong and received positive feedback. Background: AI4Future is a cross-sector project that engages five major partners - CUHK Faculty of Engineering and Faculty of Education, Hong Kong secondary schools, the government and the AI industry. A team of 14 professors with expertise in engineering and education collaborated with 17 principals and teachers from 6 secondary schools to co-create the curriculum. This team formation bridges the gap between researchers in engineering and education, together with practitioners in education context. Research Questions: What are the main features of the curriculum content developed through the co-creation process? Would the curriculum significantly improve the students perceived competence in, as well as attitude and motivation towards AI? What are the teachers perceptions of the co-creation process that aims to accommodate and foster teacher autonomy? Methodology: This study adopted a mix of quantitative and qualitative methods and involved 335 student participants. Findings: 1) two main features of learning resources, 2) the students perceived greater competence, and developed more positive attitude to learn AI, and 3) the co-creation process generated a variety of resources which enhanced the teachers knowledge in AI, as well as fostered teachers autonomy in bringing the subject matter into their classrooms.