CVSep 18, 2025
RynnVLA-001: Using Human Demonstrations to Improve Robot ManipulationYuming Jiang, Siteng Huang, Shengke Xue et al. · pku
This paper presents RynnVLA-001, a vision-language-action(VLA) model built upon large-scale video generative pretraining from human demonstrations. We propose a novel two-stage pretraining methodology. The first stage, Ego-Centric Video Generative Pretraining, trains an Image-to-Video model on 12M ego-centric manipulation videos to predict future frames conditioned on an initial frame and a language instruction. The second stage, Human-Centric Trajectory-Aware Modeling, extends this by jointly predicting future keypoint trajectories, thereby effectively bridging visual frame prediction with action prediction. Furthermore, to enhance action representation, we propose ActionVAE, a variational autoencoder that compresses sequences of actions into compact latent embeddings, reducing the complexity of the VLA output space. When finetuned on the same downstream robotics datasets, RynnVLA-001 achieves superior performance over state-of-the-art baselines, demonstrating that the proposed pretraining strategy provides a more effective initialization for VLA models.
IVMar 8, 2020
1D Probabilistic Undersampling Pattern Optimization for MR Image ReconstructionShengke Xue, Ruiliang Bai, Xinyu Jin
Magnetic resonance imaging (MRI) is mainly limited by long scanning time and vulnerable to human tissue motion artifacts, in 3D clinical scenarios. Thus, k-space undersampling is used to accelerate the acquisition of MRI while leading to visually poor MR images. Recently, some studies 1) use effective undersampling patterns, or 2) design deep neural networks to improve the quality of resulting images. However, they are considered as two separate optimization strategies. In this paper, we propose a cross-domain network for MR image reconstruction, in a retrospective data-driven manner, under limited sampling rates. Our method can simultaneously obtain the optimal undersampling pattern (in k-space) and the reconstruction model, which are customized to the type of training data, by using an end-to-end learning strategy. We propose a 1D probabilistic undersampling layer, to obtain the optimal undersampling pattern and its probability distribution in a differentiable way. We propose a 1D inverse Fourier transform layer, which connects the Fourier domain and the image domain during the forward pass and the backpropagation. In addition, by training 3D fully-sampled k-space data and MR images with the traditional Euclidean loss, we discover the universal relationship between the probability distribution of the optimal undersampling pattern and its corresponding sampling rate. Experiments show that the quantitative and qualitative results of recovered MR images by our 1D probabilistic undersampling pattern obviously outperform those of several existing sampling strategies.
CVJan 7, 2019
Double Weighted Truncated Nuclear Norm Regularization for Low-Rank Matrix CompletionShengke Xue, Wenyuan Qiu, Fan Liu et al.
Matrix completion focuses on recovering a matrix from a small subset of its observed elements, and has already gained cumulative attention in computer vision. Many previous approaches formulate this issue as a low-rank matrix approximation problem. Recently, a truncated nuclear norm has been presented as a surrogate of traditional nuclear norm, for better estimation to the rank of a matrix. The truncated nuclear norm regularization (TNNR) method is applicable in real-world scenarios. However, it is sensitive to the selection of the number of truncated singular values and requires numerous iterations to converge. Hereby, this paper proposes a revised approach called the double weighted truncated nuclear norm regularization (DW-TNNR), which assigns different weights to the rows and columns of a matrix separately, to accelerate the convergence with acceptable performance. The DW-TNNR is more robust to the number of truncated singular values than the TNNR. Instead of the iterative updating scheme in the second step of TNNR, this paper devises an efficient strategy that uses a gradient descent manner in a concise form, with a theoretical guarantee in optimization. Sufficient experiments conducted on real visual data prove that DW-TNNR has promising performance and holds the superiority in both speed and accuracy for matrix completion.
CVJan 7, 2019
Truncated nuclear norm regularization for low-rank tensor completionShengke Xue, Wenyuan Qiu, Fan Liu et al.
Recently, low-rank tensor completion has become increasingly attractive in recovering incomplete visual data. Considering a color image or video as a three-dimensional (3D) tensor, existing studies have put forward several definitions of tensor nuclear norm. However, they are limited and may not accurately approximate the real rank of a tensor, and they do not explicitly use the low-rank property in optimization. It is proved that the recently proposed truncated nuclear norm (TNN) can replace the traditional nuclear norm, as an improved approximation to the rank of a matrix. In this paper, we propose a new method called the tensor truncated nuclear norm (T-TNN), which suggests a new definition of tensor nuclear norm. The truncated nuclear norm is generalized from the matrix case to the tensor case. With the help of the low rankness of TNN, our approach improves the efficacy of tensor completion. We adopt the definition of the previously proposed tensor singular value decomposition, the alternating direction method of multipliers, and the accelerated proximal gradient line search method in our algorithm. Substantial experiments on real-world videos and images illustrate that the performance of our approach is better than those of previous methods.
CVDec 3, 2017
Low-Rank Tensor Completion by Truncated Nuclear Norm RegularizationShengke Xue, Wenyuan Qiu, Fan Liu et al.
Currently, low-rank tensor completion has gained cumulative attention in recovering incomplete visual data whose partial elements are missing. By taking a color image or video as a three-dimensional (3D) tensor, previous studies have suggested several definitions of tensor nuclear norm. However, they have limitations and may not properly approximate the real rank of a tensor. Besides, they do not explicitly use the low-rank property in optimization. It is proved that the recently proposed truncated nuclear norm (TNN) can replace the traditional nuclear norm, as a better estimation to the rank of a matrix. Thus, this paper presents a new method called the tensor truncated nuclear norm (T-TNN), which proposes a new definition of tensor nuclear norm and extends the truncated nuclear norm from the matrix case to the tensor case. Beneficial from the low rankness of TNN, our approach improves the efficacy of tensor completion. We exploit the previously proposed tensor singular value decomposition and the alternating direction method of multipliers in optimization. Extensive experiments on real-world videos and images demonstrate that the performance of our approach is superior to those of existing methods.