Yan Tang

LG
h-index3
10papers
28citations
Novelty46%
AI Score48

10 Papers

ROMay 22
Sparse Compositional Flow Matching by geometric assembly from motion primitives

Yan Tang, Yuanbo Tang, Tingyu Cao et al.

Embodied trajectories, such as the executable motion sequences of robotic manipulators, underwater vehicles, and mobile robots, are a fundamental output of embodied AI. Modern generative models often treat them as a dense, monolithic signal generated point by point, fitting an intricate high-dimensional posterior while leaving the data's latent structure unmodeled, the same sample inefficiency long identified by the structured generative model literature. We argue that a compositional latent structure is a natural choice: many embodied tasks share recurring motion fragments that can be made explicit as a finite repertoire of reusable motion primitives, and compositional units naturally align with subtask boundaries to support task decomposition. Existing compositional generators, however, compose in a latent space and rely on post-hoc decoding to relate sampled units to actual trajectory segments. We instead compose directly in the physical trajectory space through a flow-matching framework with two coupled designs. Motion-Primitive Dictionary Learning equips each atom with a learnable length mask and binary starting indicators so the atom itself is the primitive, reused verbatim wherever it is placed. Structural Sparse Flow Matching with Geometric Constraints then generates a binary placement matrix using duration-aware tokenization and a differentiable geometric loss that enforces spatial continuity and temporal contiguity where adjacent primitives meet. On Open X-Embodiment and 3DMoTraj, the framework attains state-of-the-art accuracy and reduces the FDE/ADE ratio from 1.8 to 1.07, improving ADE by 19.2% and FDE by 21.0% over the strongest baseline.

CYSep 17, 2025
AI-driven formative assessment and adaptive learning in data-science education: Evaluating an LLM-powered virtual teaching assistant

Fadjimata I Anaroua, Qing Li, Yan Tang et al.

This paper presents VITA (Virtual Teaching Assistants), an adaptive distributed learning (ADL) platform that embeds a large language model (LLM)-powered chatbot (BotCaptain) to provide dialogic support, interoperable analytics, and integrity-aware assessment for workforce preparation in data science. The platform couples context-aware conversational tutoring with formative-assessment patterns designed to promote reflective reasoning. The paper describes an end-to-end data pipeline that transforms chat logs into Experience API (xAPI) statements, instructor dashboards that surface outliers for just-in-time intervention, and an adaptive pathway engine that routes learners among progression, reinforcement, and remediation content. The paper also benchmarks VITA conceptually against emerging tutoring architectures, including retrieval-augmented generation (RAG)--based assistants and Learning Tools Interoperability (LTI)--integrated hubs, highlighting trade-offs among content grounding, interoperability, and deployment complexity. Contributions include a reusable architecture for interoperable conversational analytics, a catalog of patterns for integrity-preserving formative assessment, and a practical blueprint for integrating adaptive pathways into data-science courses. The paper concludes with implementation lessons and a roadmap (RAG integration, hallucination mitigation, and LTI~1.3 / OpenID Connect) to guide multi-course evaluations and broader adoption. In light of growing demand and scalability constraints in traditional instruction, the approach illustrates how conversational AI can support engagement, timely feedback, and personalized learning at scale. Future work will refine the platform's adaptive intelligence and examine applicability across varied educational settings.

CVSep 1, 2025
Prior-Guided Residual Diffusion: Calibrated and Efficient Medical Image Segmentation

Fuyou Mao, Beining Wu, Yanfeng Jiang et al.

Ambiguity in medical image segmentation calls for models that capture full conditional distributions rather than a single point estimate. We present Prior-Guided Residual Diffusion (PGRD), a diffusion-based framework that learns voxel-wise distributions while maintaining strong calibration and practical sampling efficiency. PGRD embeds discrete labels as one-hot targets in a continuous space to align segmentation with diffusion modeling. A coarse prior predictor provides step-wise guidance; the diffusion network then learns the residual to the prior, accelerating convergence and improving calibration. A deep diffusion supervision scheme further stabilizes training by supervising intermediate time steps. Evaluated on representative MRI and CT datasets, PGRD achieves higher Dice scores and lower NLL/ECE values than Bayesian, ensemble, Probabilistic U-Net, and vanilla diffusion baselines, while requiring fewer sampling steps to reach strong performance.

LGApr 16, 2025
Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models

Yuanbo Tang, Yan Tang, Naifan Zhang et al.

Mixture-of-Experts based large language models (MoE LLMs) have shown significant promise in multitask adaptability by dynamically routing inputs to specialized experts. Despite their success, the collaborative mechanisms among experts are still not well understood, limiting both the interpretability and optimization of these models. In this paper, we focus on two critical issues: (1) identifying expert collaboration patterns, and (2) optimizing MoE LLMs through expert pruning. To address the first issue, we propose a hierarchical sparse dictionary learning (HSDL) method that uncovers the collaboration patterns among experts. For the second issue, we introduce the Contribution-Aware Expert Pruning (CAEP) algorithm, which effectively prunes low-contribution experts. Our extensive experiments demonstrate that expert collaboration patterns are closely linked to specific input types and exhibit semantic significance across various tasks. Moreover, pruning experiments show that our approach improves overall performance by 2.5\% on average, outperforming existing methods. These findings offer valuable insights into enhancing the efficiency and interpretability of MoE LLMs, offering a clearer understanding of expert interactions and improving model optimization.

LGNov 20, 2025
Pathlet Variational Auto-Encoder for Robust Trajectory Generation

Yuanbo Tang, Yan Tang, Zixuan Zhang et al.

Trajectory generation has recently drawn growing interest in privacy-preserving urban mobility studies and location-based service applications. Although many studies have used deep learning or generative AI methods to model trajectories and have achieved promising results, the robustness and interpretability of such models are largely unexplored. This limits the application of trajectory generation algorithms on noisy real-world data and their trustworthiness in downstream tasks. To address this issue, we exploit the regular structure in urban trajectories and propose a deep generative model based on the pathlet representation, which encode trajectories with binary vectors associated with a learned dictionary of trajectory segments. Specifically, we introduce a probabilistic graphical model to describe the trajectory generation process, which includes a Variational Autoencoder (VAE) component and a linear decoder component. During training, the model can simultaneously learn the latent embedding of pathlet representations and the pathlet dictionary that captures mobility patterns in the trajectory dataset. The conditional version of our model can also be used to generate customized trajectories based on temporal and spatial constraints. Our model can effectively learn data distribution even using noisy data, achieving relative improvements of $35.4\%$ and $26.3\%$ over strong baselines on two real-world trajectory datasets. Moreover, the generated trajectories can be conveniently utilized for multiple downstream tasks, including trajectory prediction and data denoising. Lastly, the framework design offers a significant efficiency advantage, saving $64.8\%$ of the time and $56.5\%$ of GPU memory compared to previous approaches.

LGMar 27, 2025
Geographical hotspot prediction based on point cloud-voxel-community partition clustering

Yan Tang

Existing solutions to the hotspot prediction problem in the field of geographic information remain at a relatively preliminary stage. This study presents a novel approach for detecting and predicting geographical hotspots, utilizing point cloud-voxel-community partition clustering. By analyzing high-dimensional data, we represent spatial information through point clouds, which are then subdivided into multiple voxels to enhance analytical efficiency. Our method identifies spatial voxels with similar characteristics through community partitioning, thereby revealing underlying patterns in hotspot distributions. Experimental results indicate that when applied to a dataset of archaeological sites in Turkey, our approach achieves a 19.31% increase in processing speed, with an accuracy loss of merely 6%, outperforming traditional clustering methods. This method not only provides a fresh perspective for hotspot prediction but also serves as an effective tool for high-dimensional data analysis.

CLMay 21, 2019
A Seq-to-Seq Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation

Wei Jiang, Yan Tang

The prevalent approaches of Chinese word segmentation task almost rely on the Bi-LSTM neural network. However, the methods based the Bi-LSTM have some inherent drawbacks: hard to parallel computing, little efficient in applying the Dropout method to inhibit the Overfitting and little efficient in capturing the character information at the more distant site of a long sentence for the word segmentation task. In this work, we propose a sequence-to-sequence transformer model for Chinese word segmentation, which is premised a type of convolutional neural network named temporal convolutional network. The model uses the temporal convolutional network to construct an encoder, and uses one layer of fully-connected neural network to build a decoder, and applies the Dropout method to inhibit the Overfitting, and captures the character information at the distant site of a sentence by adding the layers of the encoder, and binds Conditional Random Fields model to train parameters, and uses the Viterbi algorithm to infer the final result of the Chinese word segmentation. The experiments on traditional Chinese corpora and simplified Chinese corpora show that the performance of Chinese word segmentation of the model is equivalent to the performance of the methods based the Bi-LSTM, and the model has a tremendous growth in parallel computing than the models based the Bi-LSTM.

MMAug 29, 2018
Efficient Region of Visual Interests Search for Geo-multimedia Data

Chengyuan Zhang, Yunwu Lin, Lei Zhu et al.

With the proliferation of online social networking services and mobile smart devices equipped with mobile communications module and position sensor module, massive amount of multimedia data has been collected, stored and shared. This trend has put forward higher request on massive multimedia data retrieval. In this paper, we investigate a novel spatial query named region of visual interests query (RoVIQ), which aims to search users containing geographical information and visual words. Three baseline methods are presented to introduce how to exploit existing techniques to address this problem. Then we propose the definition of this query and related notions at the first time. To improve the performance of query, we propose a novel spatial indexing structure called quadtree based inverted visual index which is a combination of quadtree, inverted index and visual words. Based on it, we design a efficient search algorithm named region of visual interests search to support RoVIQ. Experimental evaluations on real geo-image datasets demonstrate that our solution outperforms state-of-the-art method.

IRDec 22, 2017
DancingLines: An Analytical Scheme to Depict Cross-Platform Event Popularity

Tianxiang Gao, Weiming Bao, Jinning Li et al.

Nowadays, events usually burst and are propagated online through multiple modern media like social networks and search engines. There exists various research discussing the event dissemination trends on individual medium, while few studies focus on event popularity analysis from a cross-platform perspective. Challenges come from the vast diversity of events and media, limited access to aligned datasets across different media and a great deal of noise in the datasets. In this paper, we design DancingLines, an innovative scheme that captures and quantitatively analyzes event popularity between pairwise text media. It contains two models: TF-SW, a semantic-aware popularity quantification model, based on an integrated weight coefficient leveraging Word2Vec and TextRank; and wDTW-CD, a pairwise event popularity time series alignment model matching different event phases adapted from Dynamic Time Warping. We also propose three metrics to interpret event popularity trends between pairwise social platforms. Experimental results on eighteen real-world event datasets from an influential social network and a popular search engine validate the effectiveness and applicability of our scheme. DancingLines is demonstrated to possess broad application potentials for discovering the knowledge of various aspects related to events and different media.

SDAug 23, 2017
Object-Based Audio Rendering

Philip Jackson, Filippo Fazi, Frank Melchior et al.

Apparatus and methods are disclosed for performing object-based audio rendering on a plurality of audio objects which define a sound scene, each audio object comprising at least one audio signal and associated metadata. The apparatus comprises: a plurality of renderers each capable of rendering one or more of the audio objects to output rendered audio data; and object adapting means for adapting one or more of the plurality of audio objects for a current reproduction scenario, the object adapting means being configured to send the adapted one or more audio objects to one or more of the plurality of renderers.