Tiancheng Yuan

LG
h-index3
7papers
38citations
Novelty42%
AI Score39

7 Papers

LGFeb 2, 2023
Coordinating Distributed Example Orders for Provably Accelerated Training

A. Feder Cooper, Wentao Guo, Khiem Pham et al.

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.

LGJul 18, 2022
MCTensor: A High-Precision Deep Learning Library with Multi-Component Floating-Point

Tao Yu, Wentao Guo, Jianan Canal Li et al.

In this paper, we introduce MCTensor, a library based on PyTorch for providing general-purpose and high-precision arithmetic for DL training. MCTensor is used in the same way as PyTorch Tensor: we implement multiple basic, matrix-level computation operators and NN modules for MCTensor with identical PyTorch interface. Our algorithms achieve high precision computation and also benefits from heavily-optimized PyTorch floating-point arithmetic. We evaluate MCTensor arithmetic against PyTorch native arithmetic for a series of tasks, where models using MCTensor in float16 would match or outperform the PyTorch model with float32 or float64 precision.

CLFeb 24
SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing

Yifei Xu, Guilherme Potje, Shivam Shandilya et al.

Designing aligned and robust rewards for open-ended generation remains a key barrier to RL post-training. Rubrics provide structured, interpretable supervision, but scaling rubric construction is difficult: expert rubrics are costly, prompted rubrics are often superficial or inconsistent, and fixed-pool discriminative rubrics can saturate and drift, enabling reward hacking. We present SibylSense, an inference-time learning approach that adapts a frozen rubric generator through a tunable memory bank of validated rubric items. Memory is updated via verifier-based item rewards measured by reference-candidate answer discriminative gaps from a handful of examples. SibylSense alternates memory tuning with a rubric-adversarial policy update that produces rubric-satisfying candidate answers, shrinking discriminative gaps and driving the rubric generator to capture new quality dimensions. Experiments on two open-ended tasks show that SibylSense yields more discriminative rubrics and improves downstream RL performance over static and non-adaptive baselines.

DBNov 3, 2025
Vortex: Hosting ML Inference and Knowledge Retrieval Services With Tight Latency and Throughput Requirements

Yuting Yang, Tiancheng Yuan, Jamal Hashim et al.

There is growing interest in deploying ML inference and knowledge retrieval as services that could support both interactive queries by end users and more demanding request flows that arise from AIs integrated into a end-user applications and deployed as agents. Our central premise is that these latter cases will bring service level latency objectives (SLOs). Existing ML serving platforms use batching to optimize for high throughput, exposing them to unpredictable tail latencies. Vortex enables an SLO-first approach. For identical tasks, Vortex's pipelines achieve significantly lower and more stable latencies than TorchServe and Ray Serve over a wide range of workloads, often enabling a given SLO target at more than twice the request rate. When RDMA is available, the Vortex advantage is even more significant.

AIDec 22, 2024
AI-Based Teat Shape and Skin Condition Prediction for Dairy Management

Yuexing Hao, Tiancheng Yuan, Yuting Yang et al.

Dairy owners spend significant effort to keep their animals healthy. There is good reason to hope that technologies such as computer vision and artificial intelligence (AI) could reduce these costs, yet obstacles arise when adapting advanced tools to farming environments. In this work, we adapt AI tools to dairy cow teat localization, teat shape, and teat skin condition classifications. We also curate a data collection and analysis methodology for a Machine Learning (ML) pipeline. The resulting teat shape prediction model achieves a mean Average Precision (mAP) of 0.783, and the teat skin condition model achieves a mean average precision of 0.828. Our work leverages existing ML vision models to facilitate the individualized identification of teat health and skin conditions, applying AI to the dairy management industry.

DCFeb 27, 2024
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Yuting Yang, Andrea Merlina, Weijia Song et al.

We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Compass, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing tasks where data dependencies will be satisfied, collocating tasks from the same job (when this will not overload the host or its GPU), and efficiently managing GPU memory. Comparison with other state of the art schedulers shows a significant reduction in completion times while requiring the same amount or even fewer resources. In one case, just half the servers were needed for processing the same workload.

LGOct 17, 2018
XJTLUIndoorLoc: A New Fingerprinting Database for Indoor Localization and Trajectory Estimation Based on Wi-Fi RSS and Geomagnetic Field

Zhenghang Zhong, Zhe Tang, Xiangxing Li et al.

In this paper, we present a new location fingerprinting database comprised of Wi-Fi received signal strength (RSS) and geomagnetic field intensity measured with multiple devices at a multi-floor building in Xi'an Jiatong-Liverpool University, Suzhou, China. We also provide preliminary results of localization and trajectory estimation based on convolutional neural network (CNN) and long short-term memory (LSTM) network with this database. For localization, we map RSS data for a reference point to an image-like, two-dimensional array and then apply CNN which is popular in image and video analysis and recognition. For trajectory estimation, we use a modified random way point model to efficiently generate continuous step traces imitating human walking and train a stacked two-layer LSTM network with the generated data to remember the changing pattern of geomagnetic field intensity against (x,y) coordinates. Experimental results demonstrate the usefulness of our new database and the feasibility of the CNN and LSTM-based localization and trajectory estimation with the database.