Dong Xu

h-index15

3papers

629citations

Novelty53%

AI Score32

Ranked #126,112 of 194,257 authors (top 65%)#41,797 in CV (top 71%)

3 Papers

12.8CVNov 29, 2024

MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks

Yiming Wu, Wei Ji, Kecheng Zheng et al.

Recently, human motion analysis has experienced great improvement due to inspiring generative models such as the denoising diffusion model and large language model. While the existing approaches mainly focus on generating motions with textual descriptions and overlook the reciprocal task. In this paper, we present~\textbf{MoTe}, a unified multi-modal model that could handle diverse tasks by learning the marginal, conditional, and joint distributions of motion and text simultaneously. MoTe enables us to handle the paired text-motion generation, motion captioning, and text-driven motion generation by simply modifying the input context. Specifically, MoTe is composed of three components: Motion Encoder-Decoder (MED), Text Encoder-Decoder (TED), and Moti-on-Text Diffusion Model (MTDM). In particular, MED and TED are trained for extracting latent embeddings, and subsequently reconstructing the motion sequences and textual descriptions from the extracted embeddings, respectively. MTDM, on the other hand, performs an iterative denoising process on the input context to handle diverse tasks. Experimental results on the benchmark datasets demonstrate the superior performance of our proposed method on text-to-motion generation and competitive performance on motion captioning.

3.8CVJul 19, 2017

Image Projective Invariants

Erbo Li, Hanlin Mo, Dong Xu et al.

In this paper, we propose relative projective differential invariants (RPDIs) which are invariant to general projective transformations. By using RPDIs and the structural frame of integral invariant, projective weighted moment invariants (PIs) can be constructed very easily. It is first proved that a kind of projective invariants exists in terms of weighted integration of images, with relative differential invariants as the weight functions. Then, some simple instances of PIs are given. In order to ensure the stability and discriminability of PIs, we discuss how to calculate partial derivatives of discrete images more accurately. Since the number of pixels in discrete images before and after the geometric transformation may be different, we design the method to normalize the number of pixels. These ways enhance the performance of PIs. Finally, we carry out some experiments based on synthetic and real image datasets. We choose commonly used moment invariants for comparison. The results indicate that PIs have better performance than other moment invariants in image retrieval and classification. With PIs, one can compare the similarity between images under the projective transformation without knowing the parameters of the transformation, which provides a good tool to shape analysis in image processing, computer vision and pattern recognition.

25.3CVJul 28, 2016

A Siamese Long Short-Term Memory Architecture for Human Re-Identification

Rahul Rama Varior, Bing Shuai, Jiwen Lu et al.

Matching pedestrians across multiple camera views known as human re-identification (re-identification) is a challenging problem in visual surveillance. In the existing works concentrating on feature extraction, representations are formed locally and independent of other regions. We present a novel siamese Long Short-Term Memory (LSTM) architecture that can process image regions sequentially and enhance the discriminative capability of local feature representation by leveraging contextual information. The feedback connections and internal gating mechanism of the LSTM cells enable our model to memorize the spatial dependencies and selectively propagate relevant contextual information through the network. We demonstrate improved performance compared to the baseline algorithm with no LSTM units and promising results compared to state-of-the-art methods on Market-1501, CUHK03 and VIPeR datasets. Visualization of the internal mechanism of LSTM cells shows meaningful patterns can be learned by our method.