Rui-Xiao Zhang

MM
9papers
423citations
Novelty52%
AI Score26

9 Papers

MMMay 26, 2020
Self-play Reinforcement Learning for Video Transmission

Tianchi Huang, Rui-Xiao Zhang, Lifeng Sun

Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function fails to describe the requirement accurately. Thus, such proposed methods might eventually violate the original needs. To eliminate this concern, we propose \emph{Zwei}, a self-play reinforcement learning algorithm for video transmission tasks. Zwei aims to update the policy by straightforwardly utilizing the actual requirement. Technically, Zwei samples a number of trajectories from the same starting point and instantly estimates the win rate w.r.t the competition outcome. Here the competition result represents which trajectory is closer to the assigned requirement. Subsequently, Zwei optimizes the strategy by maximizing the win rate. To build Zwei, we develop simulation environments, design adequate neural network models, and invent training methods for dealing with different requirements on various video transmission scenarios. Trace-driven analysis over two representative tasks demonstrates that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios.

LGOct 24, 2019
Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning

Xin Yao, Tianchi Huang, Chenglei Wu et al.

Human beings are able to master a variety of knowledge and skills with ongoing learning. By contrast, dramatic performance degradation is observed when new tasks are added to an existing neural network model. This phenomenon, termed as \emph{Catastrophic Forgetting}, is one of the major roadblocks that prevent deep neural networks from achieving human-level artificial intelligence. Several research efforts, e.g. \emph{Lifelong} or \emph{Continual} learning algorithms, have been proposed to tackle this problem. However, they either suffer from an accumulating drop in performance as the task sequence grows longer, or require to store an excessive amount of model parameters for historical memory, or cannot obtain competitive performance on the new tasks. In this paper, we focus on the incremental multi-task image classification scenario. Inspired by the learning process of human students, where they usually decompose complex tasks into easier goals, we propose an adversarial feature alignment method to avoid catastrophic forgetting. In our design, both the low-level visual features and high-level semantic features serve as soft targets and guide the training process in multiple stages, which provide sufficient supervised information of the old tasks and help to reduce forgetting. Due to the knowledge distillation and regularization phenomenons, the proposed method gains even better performance than finetuning on the new tasks, which makes it stand out from other methods. Extensive experiments in several typical lifelong learning scenarios demonstrate that our method outperforms the state-of-the-art methods in both accuracies on new tasks and performance preservation on old tasks.

LGOct 18, 2019
Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Xin Yao, Tianchi Huang, Rui-Xiao Zhang et al.

Federated learning (FL) aims to train machine learning models in the decentralized system consisting of an enormous amount of smart edge devices. Federated averaging (FedAvg), the fundamental algorithm in FL settings, proposes on-device training and model aggregation to avoid the potential heavy communication costs and privacy concerns brought by transmitting raw data. However, through theoretical analysis we argue that 1) the multiple steps of local updating will result in gradient biases and 2) there is an inconsistency between the expected target distribution and the optimization objectives following the training paradigm in FedAvg. To tackle these problems, we first propose an unbiased gradient aggregation algorithm with the keep-trace gradient descent and the gradient evaluation strategy. Then we introduce an additional controllable meta updating procedure with a small set of data samples, indicating the expected target distribution, to provide a clear and consistent optimization objective. Both the two improvements are model- and task-agnostic and can be applied individually or together. Experimental results demonstrate that the proposed methods are faster in convergence and achieve higher accuracy with different network architectures in various FL settings.

LGAug 16, 2019
Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs

Xin Yao, Tianchi Huang, Chenglei Wu et al.

Federated learning (FL) enables on-device training over distributed networks consisting of a massive amount of modern smart devices, such as smartphones and IoT (Internet of Things) devices. However, the leading optimization algorithm in such settings, i.e., federated averaging (FedAvg), suffers from heavy communication costs and the inevitable performance drop, especially when the local data is distributed in a non-IID way. To alleviate this problem, we propose two potential solutions by introducing additional mechanisms to the on-device training. The first (FedMMD) is adopting a two-stream model with the MMD (Maximum Mean Discrepancy) constraint instead of a single model in vanilla FedAvg to be trained on devices. Experiments show that the proposed method outperforms baselines, especially in non-IID FL settings, with a reduction of more than 20% in required communication rounds. The second is FL with feature fusion (FedFusion). By aggregating the features from both the local and global models, we achieve higher accuracy at fewer communication costs. Furthermore, the feature fusion modules offer better initialization for newly incoming clients and thus speed up the process of convergence. Experiments in popular FL scenarios show that our FedFusion outperforms baselines in both accuracy and generalization ability while reducing the number of required communication rounds by more than 60%.

MMAug 6, 2019
Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning

Tianchi Huang, Chao Zhou, Rui-Xiao Zhang et al.

Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding strategies without any presumptions, has become one of the research hotspots for adaptive streaming. However, it typically suffers from several issues, i.e., low sample efficiency and lack of awareness of the video quality information. In this paper, we propose Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling the above issues. Comyco trains the policy via imitating expert trajectories given by the instant solver, which can not only avoid redundant exploration but also make better use of the collected samples. Meanwhile, Comyco attempts to pick the chunk with higher perceptual video qualities rather than video bitrates. To achieve this, we construct Comyco's neural network architecture, video datasets and QoE metrics with video quality features. Using trace-driven and real-world experiments, we demonstrate significant improvements of Comyco's sample efficiency in comparison to prior work, with 1700x improvements in terms of the number of samples required and 16x improvements on training time required. Moreover, results illustrate that Comyco outperforms previously proposed methods, with the improvements on average QoE of 7.5% - 16.79%. Especially, Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average video quality under the same rebuffering time.

MMMay 16, 2019
Reactive Video Caching via long-short-term fusion approach

Rui-Xiao Zhang, Tianchi Huang, Chenglei Wu et al.

Video caching has been a basic network functionality in today's network architectures. Although the abundance of caching replacement algorithms has been proposed recently, these methods all suffer from a key limitation: due to their immature rules, inaccurate feature engineering or unresponsive model update, they cannot strike a balance between the long-term history and short-term sudden events. To address this concern, we propose LA-E2, a long-short-term fusion caching replacement approach, which is based on a learning-aided exploration-exploitation process. Specifically, by effectively combining the deep neural network (DNN) based prediction with the online exploitation-exploration process through a \emph{top-k} method, LA-E2 can both make use of the historical information and adapt to the constantly changing popularity responsively. Through the extensive experiments in two real-world datasets, we show that LA-E2 can achieve state-of-the-art performance and generalize well. Especially when the cache size is small, our approach can outperform the baselines by 17.5\%-68.7\% higher in total hit rate.

MMNov 15, 2018
Tiyuntsong: A Self-Play Reinforcement Learning Approach for ABR Video Streaming

Tianchi Huang, Xin Yao, Chenglei Wu et al.

Existing reinforcement learning~(RL)-based adaptive bitrate~(ABR) approaches outperform the previous fixed control rules based methods by improving the Quality of Experience~(QoE) score, as the QoE metric can hardly provide clear guidance for optimization, finally resulting in the unexpected strategies. In this paper, we propose \emph{Tiyuntsong}, a self-play reinforcement learning approach with generative adversarial network~(GAN)-based method for ABR video streaming. Tiyuntsong learns strategies automatically by training two agents who are competing against each other. Note that the competition results are determined by a set of rules rather than a numerical QoE score that allows clearer optimization objectives. Meanwhile, we propose GAN Enhancement Module to extract hidden features from the past status for preserving the information without the limitations of sequence lengths. Using testbed experiments, we show that the utilization of GAN significantly improves the Tiyuntsong's performance. By comparing the performance of ABRs, we observe that Tiyuntsong also betters existing ABR algorithms in the underlying metrics.

MMMay 7, 2018
QARC: Video Quality Aware Rate Control for Real-Time Video Streaming via Deep Reinforcement Learning

Tianchi Huang, Rui-Xiao Zhang, Chao Zhou et al.

Due to the fluctuation of throughput under various network conditions, how to choose a proper bitrate adaptively for real-time video streaming has become an upcoming and interesting issue. Recent work focuses on providing high video bitrates instead of video qualities. Nevertheless, we notice that there exists a trade-off between sending bitrate and video quality, which motivates us to focus on how to get a balance between them. In this paper, we propose QARC (video Quality Awareness Rate Control), a rate control algorithm that aims to have a higher perceptual video quality with possibly lower sending rate and transmission latency. Starting from scratch, QARC uses deep reinforcement learning(DRL) algorithm to train a neural network to select future bitrates based on previously observed network status and past video frames, and we design a neural network to predict future perceptual video quality as a vector for taking the place of the raw picture in the DRL's inputs. We evaluate QARC over a trace-driven emulation. As excepted, QARC betters existing approaches.

MMMay 2, 2018
Delay-Constrained Rate Control for Real-Time Video Streaming with Bounded Neural Network

Tianchi Huang, Rui-Xiao Zhang, Chao Zhou et al.

Rate control is widely adopted during video streaming to provide both high video qualities and low latency under various network conditions. However, despite that many work have been proposed, they fail to tackle one major problem: previous methods determine a future transmission rate as a single for value which will be used in an entire time-slot, while real-world network conditions, unlike lab setup, often suffer from rapid and stochastic changes, resulting in the failures of predictions. In this paper, we propose a delay-constrained rate control approach based on end-to-end deep learning. The proposed model predicts future bit rate not as a single value, but as possible bit rate ranges using target delay gradient, with which the transmission delay is guaranteed. We collect a large scale of real-world live streaming data to train our model, and as a result, it automatically learns the correlation between throughput and target delay gradient. We build a testbed to evaluate our approach. Compared with the state-of-the-art methods, our approach demonstrates a better performance in bandwidth utilization. In all considered scenarios, a range based rate control approach outperforms the one without range by 19% to 35% in average QoE improvement.