Yutao Tang

CV
h-index15
13papers
84citations
Novelty57%
AI Score46

13 Papers

OCAug 21, 2018
Optimal Output Consensus of High-Order Multi-Agent Systems with Embedded Technique

Yutao Tang, Zhenhua Deng, Yiguang Hong

In this paper, we study an optimal output consensus problem for a multi-agent network with agents in the form of multi-input multi-output minimum-phase dynamics. Optimal output consensus can be taken as an extended version of the existing output consensus problem for higher-order agents with an optimization requirement, where the output variables of agents are driven to achieve a consensus on the optimal solution of a global cost function. To solve this problem, we first construct an optimal signal generator, and then propose an embedded control scheme by embedding the generator in the feedback loop. We give two kinds of algorithms based on different available information along with both state feedback and output feedback, and prove that these algorithms with the embedded technique can guarantee the solvability of the problem for high-order multi-agent systems under standard assumptions.

CVNov 7, 2022
Facial Tic Detection in Untrimmed Videos of Tourette Syndrome Patients

Yutao Tang, Benjamín Béjar, Joey K. -Y. Essoe et al.

Tourette Syndrome (TS) is a behavior disorder that onsets in childhood and is characterized by the expression of involuntary movements and sounds commonly referred to as tics. Behavioral therapy is the first-line treatment for patients with TS, and it helps patients raise awareness about tic occurrence as well as develop tic inhibition strategies. However, the limited availability of therapists and the difficulties for in-home follow up work limits its effectiveness. An automatic tic detection system that is easy to deploy could alleviate the difficulties of home-therapy by providing feedback to the patients while exercising tic awareness. In this work, we propose a novel architecture (T-Net) for automatic tic detection and classification from untrimmed videos. T-Net combines temporal detection and segmentation and operates on features that are interpretable to a clinician. We compare T-Net to several state-of-the-art systems working on deep features extracted from the raw videos and T-Net achieves comparable performance in terms of average precision while relying on interpretable features needed in clinical practice.

OCFeb 3, 2019
Distributed Optimization for a Class of High-order Nonlinear Multi-agent Systems with Unknown Dynamics

Yutao Tang

In this paper, we study a distributed optimization problem for a class of high-order multi-agent systems with unknown dynamics. In comparison with existing results for integrators or linear agents, we need to overcome the difficulties brought by the unknown nonlinearities and also the optimization requirement. For this purpose, we employ an embedded control based design and first convert this problem into an output stabilization problem. Then, two kinds of adaptive controllers are given for these agents to drive their outputs to the global optimal solution under some mild conditions. Finally, we show that the estimated parameter vector converges to the true parameter vector under some well-known persistence of excitation condition. The efficacy of these algorithms was verified by a simulation example.

OCFeb 3, 2019
Distributed Optimal Steady-State Regulation for High-Order Multi-Agent Systems with External Disturbances

Yutao Tang

In this paper, a distributed optimal steady-state regulation problem is formulated and investigated for heterogeneous linear multi-agent systems subject to external disturbances. We aim to steer this high-order multi-agent network to a prescribed steady-state determined as the optimal solution of a resource allocation problem in a distributed way. To solve this problem, we employ an embedded control design and convert the formulated problem to two simpler subproblems. Then, both state-feedback and output feedback controls are presented under mild assumptions to solve this problem with disturbance rejection. Moreover, we extend these results to the case with only real-time gradient information by high-gain control techniques. Finally, numerical simulations verify their effectiveness.

OCJan 13, 2016
Coordination of Multi-Agent Systems under Switching Topologies via Disturbance Observer Based Approach

Yutao Tang

In this paper, a leader-following coordination problem of heterogeneous multi-agent systems is considered under switching topologies where each agent is subject to some local (unbounded) disturbances. While these unknown disturbances may disrupt the performance of agents, a disturbance observer based approach is employed to estimate and reject them. Varying communication topologies are also taken into consideration, and their byproduct difficulties are overcome by using common Lyapunov function techniques. According to the available information in difference cases, two disturbance observer based protocols are proposed to solve this problem. Their effectiveness is verified by simulations.

SYFeb 2, 2017
Output Average Consensus Over Heterogeneous Multi-Agent Systems via Two-Level Approach

Yutao Tang

In this paper, a novel two-level framework was proposed and applied to solve the output average consensus problem over heterogeneous multi-agent systems. This approach is mainly based on the recent technique of system abstraction. For given multi-agent systems, we first constructed their abstractions as the upper level and solved their average consensus problem by leveraging well-known results for single integrators. Then the control protocols for physical agents in the lower level were synthesized in a hierarchical way by embedding the designed law for abstractions into an interface between two levels. In this way, the complexity coming from heterogeneous dynamics of agents is totally decoupled from that of the coordination task and the communication topologies. An example was given to show its effectiveness.

CVNov 10, 2023
Semantic-aware Video Representation for Few-shot Action Recognition

Yutao Tang, Benjamin Bejar, Rene Vidal

Recent work on action recognition leverages 3D features and textual information to achieve state-of-the-art performance. However, most of the current few-shot action recognition methods still rely on 2D frame-level representations, often require additional components to model temporal relations, and employ complex distance functions to achieve accurate alignment of these representations. In addition, existing methods struggle to effectively integrate textual semantics, some resorting to concatenation or addition of textual and visual features, and some using text merely as an additional supervision without truly achieving feature fusion and information transfer from different modalities. In this work, we propose a simple yet effective Semantic-Aware Few-Shot Action Recognition (SAFSAR) model to address these issues. We show that directly leveraging a 3D feature extractor combined with an effective feature-fusion scheme, and a simple cosine similarity for classification can yield better performance without the need of extra components for temporal modeling or complex distance functions. We introduce an innovative scheme to encode the textual semantics into the video representation which adaptively fuses features from text and video, and encourages the visual encoder to extract more semantically consistent features. In this scheme, SAFSAR achieves alignment and fusion in a compact way. Experiments on five challenging few-shot action recognition benchmarks under various settings demonstrate that the proposed SAFSAR model significantly improves the state-of-the-art performance.

CVNov 26, 2025
Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding

Yutao Tang, Cheng Zhao, Gaurav Mittal et al.

Recent advances in 3D vision-language models (VLMs) highlight a strong potential for 3D scene understanding and reasoning. However, effectively tokenizing 3D scenes into holistic scene tokens, and leveraging these tokens across diverse 3D understanding tasks, remain highly challenging. We present NDTokenizer3D, a generalist 3D VLM that performs a wide range of 3D scene understanding tasks while naturally supporting human interactions, thereby bridging language-level reasoning with 3D spatial understanding. The core of our approach is a novel three-stage scene tokenization pipeline built upon a Multi-Scale Normal Distributions Transform (NDT) representation, paired with a Multi-Scale NDT Decoder (MSDec). Specifically, NDTokenizer3D first constructs a multi-scale NDT representation from raw high-resolution point clouds, preserving both global context and fine-grained geometric details. Next, the MSDec progressively fuses cross-scale NDT features, producing holistic scene tokens consumable by LLM endpoints. Beyond tokenization, MSDec is repurposed as a general interface for human-interactive prompting (points, boxes, masks) and segmentation-mask decoding, unifying diverse 3D scene understanding tasks within a single architecture. With this compact and unified design, NDTokenizer3D offers a fine-grained, general-purpose 3D VLM, achieving remarkable improvements in 3D Referring Segmentation, 3D Visual Question Answering, and 3D Dense Captioning.

CVMar 7, 2024
BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling

Cheng Peng, Yutao Tang, Yifan Zhou et al.

Recent efforts in using 3D Gaussians for scene reconstruction and novel view synthesis can achieve impressive results on curated benchmarks; however, images captured in real life are often blurry. In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, \etc. Under these degradations, Gaussian-Splatting-based methods tend to overfit and produce worse results than Neural-Radiance-Field-based methods. To address this issue, we propose Blur Agnostic Gaussian Splatting (BAGS). BAGS introduces additional 2D modeling capacities such that a 3D-consistent and high quality scene can be reconstructed despite image-wise blur. Specifically, we model blur by estimating per-pixel convolution kernels from a Blur Proposal Network (BPN). BPN is designed to consider spatial, color, and depth variations of the scene to maximize modeling capacity. Additionally, BPN also proposes a quality-assessing mask, which indicates regions where blur occur. Finally, we introduce a coarse-to-fine kernel optimization scheme; this optimization scheme is fast and avoids sub-optimal solutions due to a sparse point cloud initialization, which often occurs when we apply Structure-from-Motion on blurry images. We demonstrate that BAGS achieves photorealistic renderings under various challenging blur conditions and imaging geometry, while significantly improving upon existing approaches.

CVNov 15, 2024
SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction

Yutao Tang, Yuxiang Guo, Deming Li et al.

Recent efforts in Gaussian-Splat-based Novel View Synthesis can achieve photorealistic rendering; however, such capability is limited in sparse-view scenarios due to sparse initialization and over-fitting floaters. Recent progress in depth estimation and alignment can provide dense point cloud with few views; however, the resulting pose accuracy is suboptimal. In this work, we present SPARS3R, which combines the advantages of accurate pose estimation from Structure-from-Motion and dense point cloud from depth estimation. To this end, SPARS3R first performs a Global Fusion Alignment process that maps a prior dense point cloud to a sparse point cloud from Structure-from-Motion based on triangulated correspondences. RANSAC is applied during this process to distinguish inliers and outliers. SPARS3R then performs a second, Semantic Outlier Alignment step, which extracts semantically coherent regions around the outliers and performs local alignment in these regions. Along with several improvements in the evaluation process, we demonstrate that SPARS3R can achieve photorealistic rendering with sparse images and significantly outperforms existing approaches.

CVSep 19, 2025
MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild

Deming Li, Kaiwen Jiang, Yutao Tang et al.

In-the-wild photo collections often contain limited volumes of imagery and exhibit multiple appearances, e.g., taken at different times of day or seasons, posing significant challenges to scene reconstruction and novel view synthesis. Although recent adaptations of Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) have improved in these areas, they tend to oversmooth and are prone to overfitting. In this paper, we present MS-GS, a novel framework designed with Multi-appearance capabilities in Sparse-view scenarios using 3DGS. To address the lack of support due to sparse initializations, our approach is built on the geometric priors elicited from monocular depth estimations. The key lies in extracting and utilizing local semantic regions with a Structure-from-Motion (SfM) points anchored algorithm for reliable alignment and geometry cues. Then, to introduce multi-view constraints, we propose a series of geometry-guided supervision steps at virtual views in pixel and feature levels to encourage 3D consistency and reduce overfitting. We also introduce a dataset and an in-the-wild experiment setting to set up more realistic benchmarks. We demonstrate that MS-GS achieves photorealistic renderings under various challenging sparse-view and multi-appearance conditions, and outperforms existing approaches significantly across different datasets.

SYAug 14, 2017
Distributed Coordination for a Class of Nonlinear Multi-agent Systems with Regulation Constraints

Yutao Tang, Peng Yi

In this paper, a multi-agent coordination problem with steady-state regulation constraints is investigated for a class of nonlinear systems. Unlike existing leader-following coordination formulations, the reference signal is not given by a dynamic autonomous leader but determined as the optimal solution of a distributed optimization problem. Furthermore, we consider a global constraint having noisy data observations for the optimization problem, which implies that reference signal is not trivially available with existing optimization algorithms. To handle those challenges, we present a passivity-based analysis and design approach by using only local objective function, local data observation and exchanged information from their neighbors. The proposed distributed algorithms are shown to achieve the optimal steady-state regulation by rejecting the unknown observation disturbances for passive nonlinear agents, which are persuasive in various practical problems. Applications and simulation examples are then given to verify the effectiveness of our design.

OCOct 26, 2015
Distributed Output Regulation for a Class of Nonlinear Multi-Agent Systems with Unknown-Input Leaders

Yutao Tang, Yiguang Hong, Xinghu Wang

In this paper, a distributed output regulation problem is formulated for a class of uncertain nonlinear multi-agent systems subject to local disturbances. The formulation is given to study a leader-following problem when the leader contains unknown inputs and its dynamics is different from those of the followers. Based on the conventional output regulation assumptions and graph theory, distributed feedback controllers are constructed to make the agents globally or semi-globally follow the uncertain leader even when the bound of the leader's inputs is unknown to the followers.