ROMay 23
Whole-Body Inverse Kinematics with Graph DiffusionHelong Huang, Kai Tan, Feng Wen et al.
Inverse kinematics (IK) is a fundamental problem in robotics, requiring the generation of joint configurations that satisfy target end-effector poses. Existing approaches often struggle to generalize across diverse robot morphologies and to effectively model the multi-modal nature of IK, particularly in articulated systems with multiple kinematic branches. In this work, we propose GraphDiff-IK, a structure-aware graph diffusion framework for inverse kinematics. Specifically, we represent the robot as a kinematic graph constructed from the robot URDF, where nodes correspond to actuated joints and edges encode kinematic dependencies. Building upon this representation, we formulate IK as a conditional graph diffusion process that directly generates joint configurations on the robot graph. To better capture structural dependencies in articulated systems, we further introduce a structure-aware graph reasoning framework with hierarchical stage-wise message passing and torso-aware conditioning for multi-branch robots. In addition, we incorporate noisy forward kinematics feedback and task-space supervision to improve geometric consistency during denoising. The proposed framework provides a unified formulation that naturally supports single-arm robots, dual-arm systems, and articulated robots with torso or waist structures. Extensive experiments on diverse robotic platforms demonstrate that the proposed method achieves accurate and stable IK performance while preserving the ability to generate multiple feasible solutions for redundant robotic systems.
ASMar 14
Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLRKai Tan, Lin Zhang, Ruiteng Zhang et al.
Spoofing-robust automatic speaker verification (SASV) aims to integrate automatic speaker verification (ASV) and countermeasure (CM). A popular solution is fusion of independent ASV and CM scores. To better modeling SASV, some frameworks integrate ASV and CM within a single network. However, these solutions are typically bi-encoder based, offer limited interpretability, and cannot be readily adapted to new evaluation parameters without retraining. Based on this, we propose a unified end-to-end framework via a three-class formulation that enables log-likelihood ratio (LLR) inference from class logits for a more interpretable decision pipeline. Experiments show comparable performance to existing methods on ASVSpoof5 and better results on SpoofCeleb. The visualization and analysis also prove that the three-class reformulation provides more interpretability.
BMFeb 29, 2024
A Protein Structure Prediction Approach Leveraging Transformer and CNN IntegrationYanlin Zhou, Kai Tan, Xinyu Shen et al.
Proteins are essential for life, and their structure determines their function. The protein secondary structure is formed by the folding of the protein primary structure, and the protein tertiary structure is formed by the bending and folding of the secondary structure. Therefore, the study of protein secondary structure is very helpful to the overall understanding of protein structure. Although the accuracy of protein secondary structure prediction has continuously improved with the development of machine learning and deep learning, progress in the field of protein structure prediction, unfortunately, remains insufficient to meet the large demand for protein information. Therefore, based on the advantages of deep learning-based methods in feature extraction and learning ability, this paper adopts a two-dimensional fusion deep neural network model, DstruCCN, which uses Convolutional Neural Networks (CCN) and a supervised Transformer protein language model for single-sequence protein structure prediction. The training features of the two are combined to predict the protein Transformer binding site matrix, and then the three-dimensional structure is reconstructed using energy minimization.
MLApr 27, 2024
Uncertainty quantification for iterative algorithms in linear models with application to early stoppingPierre C. Bellec, Kai Tan
This paper investigates the iterates $\hbb^1,\dots,\hbb^T$ obtained from iterative algorithms in high-dimensional linear regression problems, in the regime where the feature dimension $p$ is comparable with the sample size $n$, i.e., $p \asymp n$. The analysis and proposed estimators are applicable to Gradient Descent (GD), proximal GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA). The paper proposes novel estimators for the generalization error of the iterate $\hbb^t$ for any fixed iteration $t$ along the trajectory. These estimators are proved to be $\sqrt n$-consistent under Gaussian designs. Applications to early-stopping are provided: when the generalization error of the iterates is a U-shape function of the iteration $t$, the estimates allow to select from the data an iteration $\hat t$ that achieves the smallest generalization error along the trajectory. Additionally, we provide a technique for developing debiasing corrections and valid confidence intervals for the components of the true coefficient vector from the iterate $\hbb^t$ at any finite iteration $t$. Extensive simulations on synthetic data illustrate the theoretical results.