Lin Cao

h-index8

9papers

121citations

Novelty57%

AI Score53

Ranked #31,899 of 201,326 authors (top 16%)#12,850 in CV (top 22%)

9 Papers

IVApr 14, 2022

Information fusion approach for biomass estimation in a plateau mountainous forest using a synergistic system comprising UAS-based digital camera and LiDAR

Rong Huang, Wei Yao, Zhong Xu et al.

Forest land plays a vital role in global climate, ecosystems, farming and human living environments. Therefore, forest biomass estimation methods are necessary to monitor changes in the forest structure and function, which are key data in natural resources research. Although accurate forest biomass measurements are important in forest inventory and assessments, high-density measurements that involve airborne light detection and ranging (LiDAR) at a low flight height in large mountainous areas are highly expensive. The objective of this study was to quantify the aboveground biomass (AGB) of a plateau mountainous forest reserve using a system that synergistically combines an unmanned aircraft system (UAS)-based digital aerial camera and LiDAR to leverage their complementary advantages. In this study, we utilized digital aerial photogrammetry (DAP), which has the unique advantages of speed, high spatial resolution, and low cost, to compensate for the deficiency of forestry inventory using UAS-based LiDAR that requires terrain-following flight for high-resolution data acquisition. Combined with the sparse LiDAR points acquired by using a high-altitude and high-speed UAS for terrain extraction, dense normalized DAP point clouds can be obtained to produce an accurate and high-resolution canopy height model (CHM). Based on the CHM and spectral attributes obtained from multispectral images, we estimated and mapped the AGB of the region of interest with considerable cost efficiency. Our study supports the development of predictive models for large-scale wall-to-wall AGB mapping by leveraging the complementarity between DAP and LiDAR measurements. This work also reveals the potential of utilizing a UAS-based digital camera and LiDAR synergistically in a plateau mountainous forest area.

97.0ROMar 14

ST-VLA: Enabling 4D-Aware Spatiotemporal Understanding for General Robot Manipulation

You Wu, Zixuan Chen, Cunxu Ou et al.

Robotic manipulation in open-world environments requires reasoning across semantics, geometry, and long-horizon action dynamics. Existing hierarchical Vision-Language-Action (VLA) frameworks typically use 2D representations to connect high-level reasoning with low-level control, but lack depth awareness and temporal consistency, limiting robustness in complex 3D scenes. We propose ST-VLA, a hierarchical VLA framework using a unified 3D-4D representation to bridge perception and action. ST-VLA converts 2D guidance into 3D trajectories and generates smooth spatial masks that capture 4D spatio-temporal context, providing a stable interface between semantic reasoning and continuous control. To enable effective learning of such representations, we introduce ST-Human, a large-scale human manipulation dataset with 14 tasks and 300k episodes, annotated with 2D, 3D, and 4D supervision via a semi-automated pipeline. Using ST-Human, we train ST-VLM, a spatio-temporal vision-language model that generates spatially grounded and temporally coherent 3D representations to guide policy execution. The smooth spatial masks focus on task-relevant geometry and stabilize latent representations, enabling online replanning and long-horizon reasoning. Experiments on RLBench and real-world manipulation tasks show that \method significantly outperforms state-of-the-art baselines, improving zero-shot success rates by 44.6% and 30.3%. These results demonstrate that offloading spatio-temporal reasoning to VLMs with unified 3D-4D representations substantially improves robustness and generalization for open-world robotic manipulation. Project website: https://oucx117.github.io/ST-VLA/.

SEJul 31, 2025Code

SWE-Exp: Experience-Driven Software Issue Resolution

Silin Chen, Shaoxin Lin, Xiaodong Gu et al.

Recent advances in large language model (LLM) agents have shown remarkable progress in software issue resolution, leveraging advanced techniques such as multi-agent collaboration and Monte Carlo Tree Search (MCTS). However, current agents act as memoryless explorers - treating each problem separately without retaining or reusing knowledge from previous repair experiences. This leads to redundant exploration of failed trajectories and missed chances to adapt successful issue resolution methods to similar problems. To address this problem, we introduce SWE-Exp, an experience - enhanced approach that distills concise and actionable experience from prior agent trajectories, enabling continuous learning across issues. Our method introduces a multi-faceted experience bank that captures both successful and failed repair attempts. Specifically, it extracts reusable issue resolution knowledge at different levels - from high-level problem comprehension to specific code changes. Experiments show that SWE-Exp achieves state-of-the-art resolution rate (41.6% Pass@1) on SWE-bench-Verified under open-source agent frameworks. Our approach establishes a new paradigm in which automated software engineering agents systematically accumulate and leverage repair expertise, fundamentally shifting from trial-and-error exploration to strategic, experience-driven issue resolution.

CVAug 11, 2025Code

Multi-view Normal and Distance Guidance Gaussian Splatting for Surface Reconstruction

Bo Jia, Yanan Guo, Ying Chang et al.

3D Gaussian Splatting (3DGS) achieves remarkable results in the field of surface reconstruction. However, when Gaussian normal vectors are aligned within the single-view projection plane, while the geometry appears reasonable in the current view, biases may emerge upon switching to nearby views. To address the distance and global matching challenges in multi-view scenes, we design multi-view normal and distance-guided Gaussian splatting. This method achieves geometric depth unification and high-accuracy reconstruction by constraining nearby depth maps and aligning 3D normals. Specifically, for the reconstruction of small indoor and outdoor scenes, we propose a multi-view distance reprojection regularization module that achieves multi-view Gaussian alignment by computing the distance loss between two nearby views and the same Gaussian surface. Additionally, we develop a multi-view normal enhancement module, which ensures consistency across views by matching the normals of pixel points in nearby views and calculating the loss. Extensive experimental results demonstrate that our method outperforms the baseline in both quantitative and qualitative evaluations, significantly enhancing the surface reconstruction capability of 3DGS. Our code will be made publicly available at (https://github.com/Bistu3DV/MND-GS/).

CVOct 17, 2024Code

Hybrid bundle-adjusting 3D Gaussians for view consistent rendering with pose optimization

Yanan Guo, Ying Xie, Ying Chang et al.

Novel view synthesis has made significant progress in the field of 3D computer vision. However, the rendering of view-consistent novel views from imperfect camera poses remains challenging. In this paper, we introduce a hybrid bundle-adjusting 3D Gaussians model that enables view-consistent rendering with pose optimization. This model jointly extract image-based and neural 3D representations to simultaneously generate view-consistent images and camera poses within forward-facing scenes. The effective of our model is demonstrated through extensive experiments conducted on both real and synthetic datasets. These experiments clearly illustrate that our model can effectively optimize neural scene representations while simultaneously resolving significant camera pose misalignments. The source code is available at https://github.com/Bistu3DV/hybridBA.

CVMar 4

LeafInst - Unified Instance Segmentation Network for Fine-Grained Forestry Leaf Phenotype Analysis: A New UAV based Benchmark

Taige Luo, Junru Xie, Chenyang Fan et al.

Intelligent forest tree breeding has advanced plant phenotyping, yet existing research largely focuses on large-leaf agricultural crops, with limited attention to fine-grained leaf analysis of sapling trees in open-field environments. Natural scenes introduce challenges including scale variation, illumination changes, and irregular leaf morphology. To address these issues, we collected UAV RGB imagery of field-grown saplings and constructed the Poplar-leaf dataset, containing 1,202 branches and 19,876 pixel-level annotated leaf instances. To our knowledge, this is the first instance segmentation dataset specifically designed for forestry leaves in open-field conditions. We propose LeafInst, a novel segmentation framework tailored for irregular and multi-scale leaf structures. The model integrates an Asymptotic Feature Pyramid Network (AFPN) for multi-scale perception, a Dynamic Asymmetric Spatial Perception (DASP) module for irregular shape modeling, and a dual-residual Dynamic Anomalous Regression Head (DARH) with Top-down Concatenation decoder Feature Fusion (TCFU) to improve detection and segmentation performance. On Poplar-leaf, LeafInst achieves 68.4 mAP, outperforming YOLOv11 by 7.1 percent and MaskDINO by 6.5 percent. On the public PhenoBench benchmark, it reaches 52.7 box mAP, exceeding MaskDINO by 3.4 percent. Additional experiments demonstrate strong generalization and practical utility for large-scale leaf phenotyping.

ROJul 12, 2020

A Three-limb Teleoperated Robotic System with Foot Control for Flexible Endoscopic Surgery

Yanpei Huang, Wenjie Lai, Lin Cao et al.

Flexible endoscopy requires high skills to manipulate both the endoscope and associated instruments. In most robotic flexible endoscopic systems, the endoscope and instruments are controlled separately by two operators, which may result in communication errors and inefficient operation. We present a novel teleoperation robotic endoscopic system that can be commanded by a surgeon alone. This 13 degrees-of-freedom (DoF) system integrates a foot-controlled robotic flexible endoscope and two hand-controlled robotic endoscopic instruments (a robotic grasper and a robotic cauterizing hook). A foot-controlled human-machine interface maps the natural foot gestures to the 4-DoF movements of the endoscope, and two hand-controlled interfaces map the movements of the two hands to the two instruments individually. The proposed robotic system was validated in an ex-vivo experiment carried out by six subjects, where foot control was also compared with a sequential clutch-based hand control scheme. The participants could successfully teleoperate the endoscope and the two instruments to cut the tissues at scattered target areas in a porcine stomach. Foot control yielded 43.7% faster task completion and required less mental effort as compared to the clutch-based hand control scheme. The system introduced in this paper is intuitive for three-limb manipulation even for operators without experience of handling the endoscope and robotic instruments. This three-limb teleoperated robotic system enables one surgeon to intuitively control three endoscopic tools which normally require two operators, leading to reduced manpower, less communication errors, and improved efficiency.

ROMar 8, 2019

Performance evaluation of a foot-controlled human-robot interface

Yanpei Huang, Etienne Burdet, Lin Cao et al.

Robotic minimally invasive interventions typically require using more than two instruments. We thus developed a foot pedal interface which allows the user to control a robotic arm (simultaneously to working with the hands) with four degrees of freedom in continuous directions and speeds. This paper evaluates and compares the performances of ten naive operators in using this new pedal interface and a traditional button interface in completing tasks. These tasks are geometrically complex path-following tasks similar to those in laparoscopic training, and the traditional button interface allows axis-by-axis control with constant speeds. Precision, time, and smoothness of the subjects' control movements for these tasks are analysed. The results demonstrate that the pedal interface can be used to control a robot for complex motion tasks. The subjects kept the average error rate at a low level of around 2.6% with both interfaces, but the pedal interface resulted in about 30% faster operation speed and 60% smoother movement, which indicates improved efficiency and user experience as compared with the button interface. The results of a questionnaire show that the operators found that controlling the robot with the pedal interface was more intuitive, comfortable, and less tiring than using the button interface.

HCFeb 13, 2019

A Subject-Specific Four-Degree-of-Freedom Foot Interface to Control a Robot Arm

Yanpei Huang, Etienne Burdet, Lin Cao et al.

In robotic surgery, the surgeon controls robotic instruments using dedicated interfaces. One critical limitation of current interfaces is that they are designed to be operated by only the hands. This means that the surgeon can only control at most two robotic instruments at one time while many interventions require three instruments. This paper introduces a novel four-degree-of-freedom foot-machine interface which allows the surgeon to control a third robotic instrument using the foot, giving the surgeon a "third hand". This interface is essentially a parallel-serial hybrid mechanism with springs and force sensors. Unlike existing switch-based interfaces that can only un-intuitively generate motion in discrete directions, this interface allows intuitive control of a slave robotic arm in continuous directions and speeds, naturally matching the foot movements with dynamic force & position feedbacks. An experiment with ten naive subjects was conducted to test the system. In view of the significant variance of motion patterns between subjects, a subject-specific mapping from foot movements to command outputs was developed using Independent Component Analysis (ICA). Results showed that the ICA method could accurately identify subjects' foot motion patterns and significantly improve the prediction accuracy of motion directions from 68% to 88% as compared with the forward kinematics-based approach. This foot-machine interface can be applied for the teleoperation of industrial/surgical robots independently or in coordination with hands in the future.