Sibo Tian

RO
4papers
53citations
Novelty36%
AI Score39

4 Papers

ROJul 30, 2023
TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction

Sibo Tian, Minghui Zheng, Xiao Liang

Predicting human motion plays a crucial role in ensuring a safe and effective human-robot close collaboration in intelligent remanufacturing systems of the future. Existing works can be categorized into two groups: those focusing on accuracy, predicting a single future motion, and those generating diverse predictions based on observations. The former group fails to address the uncertainty and multi-modal nature of human motion, while the latter group often produces motion sequences that deviate too far from the ground truth or become unrealistic within historical contexts. To tackle these issues, we propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction which can generate samples that are more likely to happen while maintaining a certain level of diversity. Our model leverages Transformer as the backbone with long skip connections between shallow and deep layers. Additionally, we employ the discrete cosine transform to model motion sequences in the frequency space, thereby improving performance. In contrast to prior diffusion-based models that utilize extra modules like cross-attention and adaptive layer normalization to condition the prediction on past observed motion, we treat all inputs, including conditions, as tokens to create a more lightweight model compared to existing approaches. Extensive experimental studies are conducted on benchmark datasets to validate the effectiveness of our human motion prediction model.

78.7ROMar 10
SELF-VLA: A Skill Enhanced Agentic Vision-Language-Action Framework for Contact-Rich Disassembly

Chang Liu, Sibo Tian, Xiao Liang et al.

Disassembly automation has long been pursued to address the growing demand for efficient and proper recovery of valuable components from the end-of-life (EoL) electronic products. Existing approaches have demonstrated promising and regimented performance by decomposing the disassembly process into different subtasks. However, each subtask typically requires extensive data preparation, model training, and system management. Moreover, these approaches are often task- and component-specific, making them poorly suited to handle the variability and uncertainty of EoL products and limiting their generalization capabilities. All these factors restrict the practical deployment of current robotic disassembly systems and leave them highly reliant on human labor. With the recent development of foundation models in robotics, vision-language-action (VLA) models have shown impressive performance on standard robotic manipulation tasks, but their applicability to complex, contact-rich, and long-horizon industrial practices like disassembly, which requires sequential and precise manipulation, remains limited. To address this challenge, we propose SELF-VLA, an agentic VLA framework that integrates explicit disassembly skills. Experimental studies demonstrate that our framework significantly outperforms current state-of-the-art end-to-end VLA models on two contact-rich disassembly tasks. The video illustration can be found via https://zh.engr.tamu.edu/wp-content/uploads/sites/310/2026/03/IROS-VLA-Video.mp4.

CVSep 19, 2024
Bayesian-Optimized One-Step Diffusion Model with Knowledge Distillation for Real-Time 3D Human Motion Prediction

Sibo Tian, Minghui Zheng, Xiao Liang

Human motion prediction is a cornerstone of human-robot collaboration (HRC), as robots need to infer the future movements of human workers based on past motion cues to proactively plan their motion, ensuring safety in close collaboration scenarios. The diffusion model has demonstrated remarkable performance in predicting high-quality motion samples with reasonable diversity, but suffers from a slow generative process which necessitates multiple model evaluations, hindering real-world applications. To enable real-time prediction, in this work, we propose training a one-step multi-layer perceptron-based (MLP-based) diffusion model for motion prediction using knowledge distillation and Bayesian optimization. Our method contains two steps. First, we distill a pretrained diffusion-based motion predictor, TransFusion, directly into a one-step diffusion model with the same denoiser architecture. Then, to further reduce the inference time, we remove the computationally expensive components from the original denoiser and use knowledge distillation once again to distill the obtained one-step diffusion model into an even smaller model based solely on MLPs. Bayesian optimization is used to tune the hyperparameters for training the smaller diffusion model. Extensive experimental studies are conducted on benchmark datasets, and our model can significantly improve the inference speed, achieving real-time prediction without noticeable degradation in performance.

81.2SYApr 3
Redefining End-of-Life: Intelligent Automation for Electronics Remanufacturing Systems

Sibo Tian, Xiao Liang, Sara Behdad et al.

Remanufacturing is fundamentally more challenging than traditional manufacturing due to the significant uncertainty, variability, and incompleteness inherent in end-of-life (EoL) products. At the same time, it has become increasingly essential and urgent for facilitating a circular economy, driven by the growing volume of discarded electronic products and the escalating scarcity of critical materials. In this paper, we review the existing literature and examine the key challenges as well as emerging opportunities in intelligent automation for EoL electronics remanufacturing, providing a comprehensive overview of how robotics, control, and artificial intelligence (AI) can jointly enable scalable, safe, and intelligent remanufacturing systems. This paper starts with the definition, scope, and motivation of remanufacturing within the context of a circular economy, highlighting its societal and environmental significance. Then it delves into intelligent automation approaches for disassembly, inspection, sorting, and component reprocessing in this domain, covering advanced methods for multimodal perception, decision-making under uncertainty, flexible planning algorithms, and force-aware manipulation. The paper further reviews several emerging techniques, including large foundation models, human-in-the-loop integration, and digital twins that have the potential to support future research in this area. By integrating these topics, we aim to illustrate how next-generation remanufacturing systems can achieve robust, adaptable, and efficient operation in the face of complex real-world challenges.