Lifeng Sun

MM
h-index17
31papers
823citations
Novelty51%
AI Score56

31 Papers

CVFeb 23
Decoupling Defense Strategies for Robust Image Watermarking

Jiahui Chen, Zehang Deng, Zeyu Zhang et al.

Deep learning-based image watermarking, while robust against conventional distortions, remains vulnerable to advanced adversarial and regeneration attacks. Conventional countermeasures, which jointly optimize the encoder and decoder via a noise layer, face 2 inevitable challenges: (1) decrease of clean accuracy due to decoder adversarial training and (2) limited robustness due to simultaneous training of all three advanced attacks. To overcome these issues, we propose AdvMark, a novel two-stage fine-tuning framework that decouples the defense strategies. In stage 1, we address adversarial vulnerability via a tailored adversarial training paradigm that primarily fine-tunes the encoder while only conditionally updating the decoder. This approach learns to move the image into a non-attackable region, rather than modifying the decision boundary, thus preserving clean accuracy. In stage 2, we tackle distortion and regeneration attacks via direct image optimization. To preserve the adversarial robustness gained in stage 1, we formulate a principled, constrained image loss with theoretical guarantees, which balances the deviation from cover and previous encoded images. We also propose a quality-aware early-stop to further guarantee the lower bound of visual quality. Extensive experiments demonstrate AdvMark outperforms with the highest image quality and comprehensive robustness, i.e. up to 29\%, 33\% and 46\% accuracy improvement for distortion, regeneration and adversarial attacks, respectively.

MMAug 22, 2025Code
Beyond Interpretability: Exploring the Comprehensibility of Adaptive Video Streaming through Large Language Models

Lianchen Jia, Chaoyang Li, Ziqi Yuan et al.

Over the past decade, adaptive video streaming technology has witnessed significant advancements, particularly driven by the rapid evolution of deep learning techniques. However, the black-box nature of deep learning algorithms presents challenges for developers in understanding decision-making processes and optimizing for specific application scenarios. Although existing research has enhanced algorithm interpretability through decision tree conversion, interpretability does not directly equate to developers' subjective comprehensibility. To address this challenge, we introduce \texttt{ComTree}, the first bitrate adaptation algorithm generation framework that considers comprehensibility. The framework initially generates the complete set of decision trees that meet performance requirements, then leverages large language models to evaluate these trees for developer comprehensibility, ultimately selecting solutions that best facilitate human understanding and enhancement. Experimental results demonstrate that \texttt{ComTree} significantly improves comprehensibility while maintaining competitive performance, showing potential for further advancement. The source code is available at https://github.com/thu-media/ComTree.

AIOct 21, 2025Code
Crucible: Quantifying the Potential of Control Algorithms through LLM Agents

Lianchen Jia, Chaoyang Li, Qian Houde et al.

Control algorithms in production environments typically require domain experts to tune their parameters and logic for specific scenarios. However, existing research predominantly focuses on algorithmic performance under ideal or default configurations, overlooking the critical aspect of Tuning Potential. To bridge this gap, we introduce Crucible, an agent that employs an LLM-driven, multi-level expert simulation to turn algorithms and defines a formalized metric to quantitatively evaluate their Tuning Potential. We demonstrate Crucible's effectiveness across a wide spectrum of case studies, from classic control tasks to complex computer systems, and validate its findings in a real-world deployment. Our experimental results reveal that Crucible systematically quantifies the tunable space across different algorithms. Furthermore, Crucible provides a new dimension for algorithm analysis and design, which ultimately leads to performance improvements. Our code is available at https://github.com/thu-media/Crucible.

LGApr 8
SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport

Zheng Jiang, Nan He, Yiming Chen et al.

Federated Learning (FL) enables collaborative model training while preserving data privacy, but its practical deployment is hampered by system and statistical heterogeneity. While federated network pruning offers a path to mitigate these issues, existing methods face a critical dilemma: server-side pruning lacks personalization, whereas client-side pruning is computationally prohibitive for resource-constrained devices. Furthermore, the pruning process itself induces significant parametric divergence among heterogeneous submodels, destabilizing training and hindering global convergence. To address these challenges, we propose SubFLOT, a novel framework for server-side personalized federated pruning. SubFLOT introduces an Optimal Transport-enhanced Pruning (OTP) module that treats historical client models as proxies for local data distributions, formulating the pruning task as a Wasserstein distance minimization problem to generate customized submodels without accessing raw data. Concurrently, to counteract parametric divergence, our Scaling-based Adaptive Regularization (SAR) module adaptively penalizes a submodel's deviation from the global model, with the penalty's strength scaled by the client's pruning rate. Comprehensive experiments demonstrate that SubFLOT consistently and substantially outperforms state-of-the-art methods, underscoring its potential for deploying efficient and personalized models on resource-constrained edge devices.

CVApr 13
Test-time Scaling over Perception: Resolving the Grounding Paradox in Thinking with Images

Zheng Jiang, Yiming Chen, Nan He et al.

Recent multimodal large language models (MLLMs) have begun to support Thinking with Images by invoking visual tools such as zooming and cropping during inference. Yet these systems remain brittle in fine-grained visual reasoning because they must decide where to look before they have access to the evidence needed to make that decision correctly. We identify this circular dependency as the Grounding Paradox. To address it, we propose Test-Time Scaling over Perception (TTSP), a framework that treats perception itself as a scalable inference process. TTSP generates multiple exploratory perception traces, filters unreliable traces using entropy-based confidence estimation, distills validated observations into structured knowledge, and iteratively refines subsequent exploration toward unresolved uncertainty. Extensive experiments on high-resolution and general multimodal reasoning benchmarks show that TTSP consistently outperforms strong baselines across backbone sizes, while also exhibiting favorable scalability and token efficiency. Our results suggest that scaling perception at test time is a promising direction for robust multimodal reasoning under perceptual uncertainty.

CVApr 9
MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning

Zheng Jiang, Heng Guo, Chengyu Fang et al.

Medical Vision-Language Models (VLMs) hold immense promise for complex clinical tasks, but their reasoning capabilities are often constrained by text-only paradigms that fail to ground inferences in visual evidence. This limitation not only curtails performance on tasks requiring fine-grained visual analysis but also introduces risks of visual hallucination in safety-critical applications. Thus, we introduce MedVR, a novel reinforcement learning framework that enables annotation-free visual reasoning for medical VLMs. Its core innovation lies in two synergistic mechanisms: Entropy-guided Visual Regrounding (EVR) uses model uncertainty to direct exploration, while Consensus-based Credit Assignment (CCA) distills pseudo-supervision from rollout agreement. Without any human annotations for intermediate steps, MedVR achieves state-of-the-art performance on diverse public medical VQA benchmarks, significantly outperforming existing models. By learning to reason directly with visual evidence, MedVR promotes the robustness and transparency essential for accelerating the clinical deployment of medical AI.

CVJan 5
RRNet: Configurable Real-Time Video Enhancement with Arbitrary Local Lighting Variations

Wenlong Yang, Canran Jin, Weihang Yuan et al.

With the growing demand for real-time video enhancement in live applications, existing methods often struggle to balance speed and effective exposure control, particularly under uneven lighting. We introduce RRNet (Rendering Relighting Network), a lightweight and configurable framework that achieves a state-of-the-art tradeoff between visual quality and efficiency. By estimating parameters for a minimal set of virtual light sources, RRNet enables localized relighting through a depth-aware rendering module without requiring pixel-aligned training data. This object-aware formulation preserves facial identity and supports real-time, high-resolution performance using a streamlined encoder and lightweight prediction head. To facilitate training, we propose a generative AI-based dataset creation pipeline that synthesizes diverse lighting conditions at low cost. With its interpretable lighting control and efficient architecture, RRNet is well suited for practical applications such as video conferencing, AR-based portrait enhancement, and mobile photography. Experiments show that RRNet consistently outperforms prior methods in low-light enhancement, localized illumination adjustment, and glare removal.

CVFeb 15
HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming

Jiahui Chen, Bo Peng, Lianchen Jia et al.

Content-aware streaming requires dynamic, chunk-level importance weights to optimize subjective quality of experience (QoE). However, direct human annotation is prohibitively expensive while vision-saliency models generalize poorly. We introduce HiVid, the first framework to leverage Large Language Models (LLMs) as a scalable human proxy to generate high-fidelity weights for both Video-on-Demand (VOD) and live streaming. We address 3 non-trivial challenges: (1) To extend LLMs' limited modality and circumvent token limits, we propose a perception module to assess frames in a local context window, autoregressively building a coherent understanding of the video. (2) For VOD with rating inconsistency across local windows, we propose a ranking module to perform global re-ranking with a novel LLM-guided merge-sort algorithm. (3) For live streaming which requires low-latency, online inference without future knowledge, we propose a prediction module to predict future weights with a multi-modal time series model, which comprises a content-aware attention and adaptive horizon to accommodate asynchronous LLM inference. Extensive experiments show HiVid improves weight prediction accuracy by up to 11.5\% for VOD and 26\% for live streaming over SOTA baselines. Real-world user study validates HiVid boosts streaming QoE correlation by 14.7\%.

IRMay 21, 2021
A General Method For Automatic Discovery of Powerful Interactions In Click-Through Rate Prediction

Ze Meng, Jinnian Zhang, Yumeng Li et al.

Modeling powerful interactions is a critical challenge in Click-through rate (CTR) prediction, which is one of the most typical machine learning tasks in personalized advertising and recommender systems. Although developing hand-crafted interactions is effective for a small number of datasets, it generally requires laborious and tedious architecture engineering for extensive scenarios. In recent years, several neural architecture search (NAS) methods have been proposed for designing interactions automatically. However, existing methods only explore limited types and connections of operators for interaction generation, leading to low generalization ability. To address these problems, we propose a more general automated method for building powerful interactions named AutoPI. The main contributions of this paper are as follows: AutoPI adopts a more general search space in which the computational graph is generalized from existing network connections, and the interactive operators in the edges of the graph are extracted from representative hand-crafted works. It allows searching for various powerful feature interactions to produce higher AUC and lower Logloss in a wide variety of applications. Besides, AutoPI utilizes a gradient-based search strategy for exploration with a significantly low computational cost. Experimentally, we evaluate AutoPI on a diverse suite of benchmark datasets, demonstrating the generalizability and efficiency of AutoPI over hand-crafted architectures and state-of-the-art NAS algorithms.

MMMay 6, 2021
Multimedia Edge Computing

Zhi Wang, Wenwu Zhu, Lifeng Sun et al.

In this paper, we investigate the recent studies on multimedia edge computing, from sensing not only traditional visual/audio data but also individuals' geographical preference and mobility behaviors, to performing distributed machine learning over such data using the joint edge and cloud infrastructure and using evolutional strategies like reinforcement learning and online learning at edge devices to optimize the quality of experience for multimedia services at the last mile proactively. We provide both a retrospective view of recent rapid migration (resp. merge) of cloud multimedia to (resp. and) edge-aware multimedia and insights on the fundamental guidelines for designing multimedia edge computing strategies that target satisfying the changing demand of quality of experience. By showing the recent research studies and industrial solutions, we also provide future directions towards high-quality multimedia services over edge computing.

MMMay 26, 2020
Self-play Reinforcement Learning for Video Transmission

Tianchi Huang, Rui-Xiao Zhang, Lifeng Sun

Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function fails to describe the requirement accurately. Thus, such proposed methods might eventually violate the original needs. To eliminate this concern, we propose \emph{Zwei}, a self-play reinforcement learning algorithm for video transmission tasks. Zwei aims to update the policy by straightforwardly utilizing the actual requirement. Technically, Zwei samples a number of trajectories from the same starting point and instantly estimates the win rate w.r.t the competition outcome. Here the competition result represents which trajectory is closer to the assigned requirement. Subsequently, Zwei optimizes the strategy by maximizing the win rate. To build Zwei, we develop simulation environments, design adequate neural network models, and invent training methods for dealing with different requirements on various video transmission scenarios. Trace-driven analysis over two representative tasks demonstrates that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios.

LGMay 26, 2020
Continual Local Training for Better Initialization of Federated Models

Xin Yao, Lifeng Sun

Federated learning (FL) refers to the learning paradigm that trains machine learning models directly in the decentralized systems consisting of smart edge devices without transmitting the raw data, which avoids the heavy communication costs and privacy concerns. Given the typical heterogeneous data distributions in such situations, the popular FL algorithm \emph{Federated Averaging} (FedAvg) suffers from weight divergence and thus cannot achieve a competitive performance for the global model (denoted as the \emph{initial performance} in FL) compared to centralized methods. In this paper, we propose the local continual training strategy to address this problem. Importance weights are evaluated on a small proxy dataset on the central server and then used to constrain the local training. With this additional term, we alleviate the weight divergence and continually integrate the knowledge on different local clients into the global model, which ensures a better generalization ability. Experiments on various FL settings demonstrate that our method significantly improves the initial performance of federated models with few extra communication costs.

LGOct 24, 2019
Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning

Xin Yao, Tianchi Huang, Chenglei Wu et al.

Human beings are able to master a variety of knowledge and skills with ongoing learning. By contrast, dramatic performance degradation is observed when new tasks are added to an existing neural network model. This phenomenon, termed as \emph{Catastrophic Forgetting}, is one of the major roadblocks that prevent deep neural networks from achieving human-level artificial intelligence. Several research efforts, e.g. \emph{Lifelong} or \emph{Continual} learning algorithms, have been proposed to tackle this problem. However, they either suffer from an accumulating drop in performance as the task sequence grows longer, or require to store an excessive amount of model parameters for historical memory, or cannot obtain competitive performance on the new tasks. In this paper, we focus on the incremental multi-task image classification scenario. Inspired by the learning process of human students, where they usually decompose complex tasks into easier goals, we propose an adversarial feature alignment method to avoid catastrophic forgetting. In our design, both the low-level visual features and high-level semantic features serve as soft targets and guide the training process in multiple stages, which provide sufficient supervised information of the old tasks and help to reduce forgetting. Due to the knowledge distillation and regularization phenomenons, the proposed method gains even better performance than finetuning on the new tasks, which makes it stand out from other methods. Extensive experiments in several typical lifelong learning scenarios demonstrate that our method outperforms the state-of-the-art methods in both accuracies on new tasks and performance preservation on old tasks.

LGOct 18, 2019
Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Xin Yao, Tianchi Huang, Rui-Xiao Zhang et al.

Federated learning (FL) aims to train machine learning models in the decentralized system consisting of an enormous amount of smart edge devices. Federated averaging (FedAvg), the fundamental algorithm in FL settings, proposes on-device training and model aggregation to avoid the potential heavy communication costs and privacy concerns brought by transmitting raw data. However, through theoretical analysis we argue that 1) the multiple steps of local updating will result in gradient biases and 2) there is an inconsistency between the expected target distribution and the optimization objectives following the training paradigm in FedAvg. To tackle these problems, we first propose an unbiased gradient aggregation algorithm with the keep-trace gradient descent and the gradient evaluation strategy. Then we introduce an additional controllable meta updating procedure with a small set of data samples, indicating the expected target distribution, to provide a clear and consistent optimization objective. Both the two improvements are model- and task-agnostic and can be applied individually or together. Experimental results demonstrate that the proposed methods are faster in convergence and achieve higher accuracy with different network architectures in various FL settings.

LGAug 16, 2019
Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs

Xin Yao, Tianchi Huang, Chenglei Wu et al.

Federated learning (FL) enables on-device training over distributed networks consisting of a massive amount of modern smart devices, such as smartphones and IoT (Internet of Things) devices. However, the leading optimization algorithm in such settings, i.e., federated averaging (FedAvg), suffers from heavy communication costs and the inevitable performance drop, especially when the local data is distributed in a non-IID way. To alleviate this problem, we propose two potential solutions by introducing additional mechanisms to the on-device training. The first (FedMMD) is adopting a two-stream model with the MMD (Maximum Mean Discrepancy) constraint instead of a single model in vanilla FedAvg to be trained on devices. Experiments show that the proposed method outperforms baselines, especially in non-IID FL settings, with a reduction of more than 20% in required communication rounds. The second is FL with feature fusion (FedFusion). By aggregating the features from both the local and global models, we achieve higher accuracy at fewer communication costs. Furthermore, the feature fusion modules offer better initialization for newly incoming clients and thus speed up the process of convergence. Experiments in popular FL scenarios show that our FedFusion outperforms baselines in both accuracy and generalization ability while reducing the number of required communication rounds by more than 60%.

MMAug 6, 2019
Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning

Tianchi Huang, Chao Zhou, Rui-Xiao Zhang et al.

Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding strategies without any presumptions, has become one of the research hotspots for adaptive streaming. However, it typically suffers from several issues, i.e., low sample efficiency and lack of awareness of the video quality information. In this paper, we propose Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling the above issues. Comyco trains the policy via imitating expert trajectories given by the instant solver, which can not only avoid redundant exploration but also make better use of the collected samples. Meanwhile, Comyco attempts to pick the chunk with higher perceptual video qualities rather than video bitrates. To achieve this, we construct Comyco's neural network architecture, video datasets and QoE metrics with video quality features. Using trace-driven and real-world experiments, we demonstrate significant improvements of Comyco's sample efficiency in comparison to prior work, with 1700x improvements in terms of the number of samples required and 16x improvements on training time required. Moreover, results illustrate that Comyco outperforms previously proposed methods, with the improvements on average QoE of 7.5% - 16.79%. Especially, Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average video quality under the same rebuffering time.

MMMay 16, 2019
Reactive Video Caching via long-short-term fusion approach

Rui-Xiao Zhang, Tianchi Huang, Chenglei Wu et al.

Video caching has been a basic network functionality in today's network architectures. Although the abundance of caching replacement algorithms has been proposed recently, these methods all suffer from a key limitation: due to their immature rules, inaccurate feature engineering or unresponsive model update, they cannot strike a balance between the long-term history and short-term sudden events. To address this concern, we propose LA-E2, a long-short-term fusion caching replacement approach, which is based on a learning-aided exploration-exploitation process. Specifically, by effectively combining the deep neural network (DNN) based prediction with the online exploitation-exploration process through a \emph{top-k} method, LA-E2 can both make use of the historical information and adapt to the constantly changing popularity responsively. Through the extensive experiments in two real-world datasets, we show that LA-E2 can achieve state-of-the-art performance and generalize well. Especially when the cache size is small, our approach can outperform the baselines by 17.5\%-68.7\% higher in total hit rate.

MMNov 15, 2018
Tiyuntsong: A Self-Play Reinforcement Learning Approach for ABR Video Streaming

Tianchi Huang, Xin Yao, Chenglei Wu et al.

Existing reinforcement learning~(RL)-based adaptive bitrate~(ABR) approaches outperform the previous fixed control rules based methods by improving the Quality of Experience~(QoE) score, as the QoE metric can hardly provide clear guidance for optimization, finally resulting in the unexpected strategies. In this paper, we propose \emph{Tiyuntsong}, a self-play reinforcement learning approach with generative adversarial network~(GAN)-based method for ABR video streaming. Tiyuntsong learns strategies automatically by training two agents who are competing against each other. Note that the competition results are determined by a set of rules rather than a numerical QoE score that allows clearer optimization objectives. Meanwhile, we propose GAN Enhancement Module to extract hidden features from the past status for preserving the information without the limitations of sequence lengths. Using testbed experiments, we show that the utilization of GAN significantly improves the Tiyuntsong's performance. By comparing the performance of ABRs, we observe that Tiyuntsong also betters existing ABR algorithms in the underlying metrics.

MMMay 21, 2018
Performance Bound Analysis for Crowdsourced Mobile Video Streaming

Lin Gao, Ming Tang, Haitian Pang et al.

Adaptive bitrate (ABR) streaming enables video users to adapt the playing bitrate to the real-time network conditions to achieve the desirable quality of experience (QoE). In this work, we propose a novel crowdsourced streaming framework for multi-user ABR video streaming over wireless networks. This framework enables the nearby mobile video users to crowdsource their radio links and resources for cooperative video streaming. We focus on analyzing the social welfare performance bound of the proposed crowdsourced streaming system. Directly solving this bound is challenging due to the asynchronous operations of users. To this end, we introduce a virtual time-slotted system with the synchronized operations, and formulate the associated social welfare optimization problem as a linear programming. We show that the optimal social welfare performance of the virtual system provides effective upper-bound and lower-bound for the optimal performance (bound) of the original asynchronous system, hence characterizes the feasible performance region of the proposed crowdsourced streaming system. The performance bounds derived in this work can serve as a benchmark for the future online algorithm design and incentive mechanism design.

MMMay 7, 2018
QARC: Video Quality Aware Rate Control for Real-Time Video Streaming via Deep Reinforcement Learning

Tianchi Huang, Rui-Xiao Zhang, Chao Zhou et al.

Due to the fluctuation of throughput under various network conditions, how to choose a proper bitrate adaptively for real-time video streaming has become an upcoming and interesting issue. Recent work focuses on providing high video bitrates instead of video qualities. Nevertheless, we notice that there exists a trade-off between sending bitrate and video quality, which motivates us to focus on how to get a balance between them. In this paper, we propose QARC (video Quality Awareness Rate Control), a rate control algorithm that aims to have a higher perceptual video quality with possibly lower sending rate and transmission latency. Starting from scratch, QARC uses deep reinforcement learning(DRL) algorithm to train a neural network to select future bitrates based on previously observed network status and past video frames, and we design a neural network to predict future perceptual video quality as a vector for taking the place of the raw picture in the DRL's inputs. We evaluate QARC over a trace-driven emulation. As excepted, QARC betters existing approaches.

MMMay 2, 2018
Delay-Constrained Rate Control for Real-Time Video Streaming with Bounded Neural Network

Tianchi Huang, Rui-Xiao Zhang, Chao Zhou et al.

Rate control is widely adopted during video streaming to provide both high video qualities and low latency under various network conditions. However, despite that many work have been proposed, they fail to tackle one major problem: previous methods determine a future transmission rate as a single for value which will be used in an entire time-slot, while real-world network conditions, unlike lab setup, often suffer from rapid and stochastic changes, resulting in the failures of predictions. In this paper, we propose a delay-constrained rate control approach based on end-to-end deep learning. The proposed model predicts future bit rate not as a single value, but as possible bit rate ranges using target delay gradient, with which the transmission delay is guaranteed. We collect a large scale of real-world live streaming data to train our model, and as a result, it automatically learns the correlation between throughput and target delay gradient. We build a testbed to evaluate our approach. Compared with the state-of-the-art methods, our approach demonstrates a better performance in bandwidth utilization. In all considered scenarios, a range based rate control approach outperforms the one without range by 19% to 35% in average QoE improvement.

MMMar 10, 2017
Towards Wi-Fi AP-Assisted Content Prefetching for On-Demand TV Series: A Reinforcement Learning Approach

Wen Hu, Yichao Jin, Yonggang Wen et al.

The emergence of smart Wi-Fi APs (Access Point), which are equipped with huge storage space, opens a new research area on how to utilize these resources at the edge network to improve users' quality of experience (QoE) (e.g., a short startup delay and smooth playback). One important research interest in this area is content prefetching, which predicts and accurately fetches contents ahead of users' requests to shift the traffic away during peak periods. However, in practice, the different video watching patterns among users, and the varying network connection status lead to the time-varying server load, which eventually makes the content prefetching problem challenging. To understand this challenge, this paper first performs a large-scale measurement study on users' AP connection and TV series watching patterns using real-traces. Then, based on the obtained insights, we formulate the content prefetching problem as a Markov Decision Process (MDP). The objective is to strike a balance between the increased prefetching&storage cost incurred by incorrect prediction and the reduced content download delay because of successful prediction. A learning-based approach is proposed to solve this problem and another three algorithms are adopted as baselines. In particular, first, we investigate the performance lower bound by using a random algorithm, and the upper bound by using an ideal offline approach. Then, we present a heuristic algorithm as another baseline. Finally, we design a reinforcement learning algorithm that is more practical to work in the online manner. Through extensive trace-based experiments, we demonstrate the performance gain of our design. Remarkably, our learning-based algorithm achieves a better precision and hit ratio (e.g., 80%) with about 70% (resp. 50%) cost saving compared to the random (resp. heuristic) algorithm.

MMJul 5, 2016
Dynamic Flow Scheduling Strategy in Multihoming Video CDNs

Ming Ma, Zhi Wang, Yankai Zhang et al.

Multihoming for a video Content Delivery Network (CDN) allows edge peering servers to deliver video chunks through different Internet Service Providers (ISPs), to achieve an improved quality of service (QoS) for video streaming users. However, since traditional strategies for a multihoming video CDN are simply designed according to static rules, e.g., simply sending traffic via a ISP which is the same as the ISP of client, they fail to dynamically allocate resources among different ISPs over time. In this paper, we perform measurement studies to demonstrate that such static allocation mechanism is inefficient to make full utilization of multiple ISPs' resources. To address this problem, we propose a dynamic flow scheduling strategy for multihoming video CDN. The challenge is to find the control parameters that can guide the ISP selection when performing flow scheduling. Using a data-driven approach, we find factors that have a major impact on the performance improvement in the dynamic flow scheduling. We further utilize an information gain approach to generate parameter combinations that can be used to guide the flow scheduling, i.e., to determine the ISP each request should be responded by. Our evaluation results demonstrate that our design effectively performs the flow scheduling. In particular, our design yields near optimal performance in a simulation of real-world multihoming setup.

MMJul 5, 2016
A Measurement Study of TCP Performance for Chunk Delivery in DASH

Wen Hu, Zhi Wang, Lifeng Sun

Dynamic Adaptive Streaming over HTTP (DASH) has emerged as an increasingly popular paradigm for video streaming [13], in which a video is segmented into many chunks delivered to users by HTTP request/response over Transmission Control Protocol (TCP) con- nections. Therefore, it is intriguing to study the performance of strategies implemented in conventional TCPs, which are not dedicated for video streaming, e.g., whether chunks are efficiently delivered when users per- form interactions with the video players. In this paper, we conduct mea- surement studies on users chunk requesting traces in DASH from a rep- resentative video streaming provider, to investigate users behaviors in DASH, and TCP-connection-level traces from CDN servers, to investi- gate the performance of TCP for DASH. By studying how video chunks are delivered in both the slow start and congestion avoidance phases, our observations have revealed the performance characteristics of TCP for DASH as follows: (1) Request patterns in DASH have a great impact on the performance of TCP variations including cubic; (2) Strategies in conventional TCPs may cause user perceived quality degradation in DASH streaming; (3) Potential improvement to TCP strategies for better delivery in DASH can be further explored.

MMJul 5, 2016
Towards Network-Failure-Tolerant Content Delivery for Web Content

Wen Hu, Zhi Wang, Lifeng Sun

Popularly used to distribute a variety of multimedia content items in today Internet, HTTP-based web content delivery still suffers from various content delivery failures. Hindered by the expensive deployment cost, the conventional CDN can not deploy as many edge servers as possible to successfully deliver content items to all users under these delivery failures. In this paper, we propose a joint CDN and peer-assisted web content delivery framework to address the delivery failure problem. Different from conventional peer-assisted approaches for web content delivery, which mainly focus on alleviating the CDN servers bandwidth load, we study how to use a browser-based peer-assisted scheme, namely WebRTC, to resolve content delivery failures. To this end, we carry out large-scale measurement studies on how users access and view webpages. Our measurement results demonstrate the challenges (e.g., peers stay on a webpage extremely short) that can not be directly solved by conventional P2P strategies, and some important webpage viewing patterns. Due to these unique characteristics, WebRTC peers open up new possibilities for helping the web content delivery, coming with the problem of how to utilize the dynamic resources efficiently. We formulate the peer selection that is the critical strategy in our framework, as an optimization problem, and design a heuristic algorithm based on the measurement insights to solve it. Our simulation experiments driven by the traces from Tencent QZone demonstrate the effectiveness of our design: compared with non-peer-assisted strategy and random peer selection strategy, our design significantly improves the successful relay ratio of web content items under network failures, e.g., our design improves the content download ratio up to 60% even when users located in a particular region (e.g., city) where none can connect to the regional CDN server.

MMJun 14, 2016
Social- and Mobility-Aware Device-to-Device Content Delivery

Zhi Wang, Lifeng Sun, Miao Zhang et al.

Mobile online social network services have seen a rapid increase, in which the huge amount of user-generated social media contents propagating between users via social connections has significantly challenged the traditional content delivery paradigm: First, replicating all of the contents generated by users to edge servers that well "fit" the receivers becomes difficult due to the limited bandwidth and storage capacities. Motivated by device-to-device (D2D) communication that allows users with smart devices to transfer content directly, we propose replicating bandwidth-intensive social contents in a device-to-device manner. Based on large-scale measurement studies on social content propagation and user mobility patterns in edge-network regions, we observe that (1) Device-to-device replication can significantly help users download social contents from nearby neighboring peers; (2) Both social propagation and mobility patterns affect how contents should be replicated; (3) The replication strategies depend on regional characteristics ({\em e.g.}, how users move across regions). Using these measurement insights, we propose a joint \emph{propagation- and mobility-aware} content replication strategy for edge-network regions, in which social contents are assigned to users in edge-network regions according to a joint consideration of social graph, content propagation and user mobility. We formulate the replication scheduling as an optimization problem and design distributed algorithm only using historical, local and partial information to solve it. Trace-driven experiments further verify the superiority of our proposal: compared with conventional pure movement-based and popularity-based approach, our design can significantly ($2-4$ times) improve the amount of social contents successfully delivered by device-to-device replication.

MMMay 25, 2016
Understanding Content Placement Strategies in Smartrouter-based Peer CDN for Video Streaming

Ming Ma, Zhi Wang, Ke Su et al.

Recent years have witnessed a new video delivery paradigm: smartrouter-based peer video content delivery network, which is enabled by smartrouters deployed at users' homes. ChinaCache (one of the largest CDN providers in China) and Youku (a video provider using smartrouters to assist video delivery) announced their cooperation in 2015, to create a new paradigm of content delivery based on householders' network resources. This new paradigm is different from the conventional peer-to-peer (P2P) approach, because millions of dedicated smartrouters are operated by the centralized video service providers in a coordinative manner. Thus it is intriguing to study the content placement strategies used in a smartrouter-based content delivery system, as well as its potential impact on the content delivery ecosystem. In this paper, we carry out measurement studies of Youku's peer video CDN, who has deployed over 300K smartrouter devices for its video delivery. In our measurement studies, 104K videos were investigated and 4TB traffic has been analyzed, over controlled smartrouter nodes and players. Our measurement insights are as follows. First, a global content replication strategy is essential for the peer CDN systems. Second, such peer CDN deployment itself can form an effective sub-system for end-to-end QoS monitoring, which can be used for fine-grained request redirection (e.g., user-level) and content replication. We also show our analysis on the performance limitations and propose potential improvements to the peer CDN systems.

MMMay 25, 2016
Understanding the Smartrouter-based Peer CDN for Video Streaming

Ming Ma, Zhi Wang, Ke Su et al.

Recent years have witnessed a new video delivery paradigm: smartrouter-based video delivery network, which is enabled by smartrouters deployed at users' homes, together with the conventional video servers deployed in the datacenters. Recently, ChinaCache, a large content delivery network (CDN) provider, and Youku, a video service provider using smartrouters to assist video delivery, announced their cooperation to create a new paradigm of content delivery based on householders' network resources. This new paradigm is different from the conventional peer-to-peer (P2P) approach, because such dedicated smartrouters are inherently operated by the centralized video service providers in a coordinative manner. It is intriguing to study the strategies, performance and potential impact on the content delivery ecosystem of such peer CDN systems. In this paper, we study the Youku peer CDN, which has deployed over 300K smartrouter devices for its video streaming. In our measurement, 78K videos were investigated and 3TB traffic has been analyzed, over controlled routers and players. Our contributions are the following measurement insights. First, a global replication and caching strategy is essential for the peer CDN systems, and proactively scheduling replication and caching on a daily basis can guarantee their performance. Second, such peer CDN deployment can itself form an effective Quality of Service (QoS) monitoring sub-system, which can be used for fine-grained user request redirection. We also provide our analysis on the performance issues and potential improvements to the peer CDN systems.

IRFeb 11, 2015
MAP: Microblogging Assisted Profiling of TV Shows

Xiahong Lin, Zhi Wang, Lifeng Sun

Online microblogging services that have been increasingly used by people to share and exchange information, have emerged as a promising way to profiling multimedia contents, in a sense to provide users a socialized abstraction and understanding of these contents. In this paper, we propose a microblogging profiling framework, to provide a social demonstration of TV shows. Challenges for this study lie in two folds: First, TV shows are generally offline, i.e., most of them are not originally from the Internet, and we need to create a connection between these TV shows with online microblogging services; Second, contents in a microblogging service are extremely noisy for video profiling, and we need to strategically retrieve the most related information for the TV show profiling.To address these challenges, we propose a MAP, a microblogging-assisted profiling framework, with contributions as follows: i) We propose a joint user and content retrieval scheme, which uses information about both actors and topics of a TV show to retrieve related microblogs; ii) We propose a social-aware profiling strategy, which profiles a video according to not only its content, but also the social relationship of its microblogging users and its propagation in the social network; iii) We present some interesting analysis, based on our framework to profile real-world TV shows.

CVMar 3, 2014
Cross-Scale Cost Aggregation for Stereo Matching

Kang Zhang, Yuqiang Fang, Dongbo Min et al.

Human beings process stereoscopic correspondence across multiple scales. However, this bio-inspiration is ignored by state-of-the-art cost aggregation methods for dense stereo correspondence. In this paper, a generic cross-scale cost aggregation framework is proposed to allow multi-scale interaction in cost aggregation. We firstly reformulate cost aggregation from a unified optimization perspective and show that different cost aggregation methods essentially differ in the choices of similarity kernels. Then, an inter-scale regularizer is introduced into optimization and solving this new optimization problem leads to the proposed framework. Since the regularization term is independent of the similarity kernel, various cost aggregation methods can be integrated into the proposed general framework. We show that the cross-scale framework is important as it effectively and efficiently expands state-of-the-art cost aggregation methods and leads to significant improvements, when evaluated on Middlebury, KITTI and New Tsukuba datasets.

CVFeb 10, 2014
Binary Stereo Matching

Kang Zhang, Jiyang Li, Yijing Li et al.

In this paper, we propose a novel binary-based cost computation and aggregation approach for stereo matching problem. The cost volume is constructed through bitwise operations on a series of binary strings. Then this approach is combined with traditional winner-take-all strategy, resulting in a new local stereo matching algorithm called binary stereo matching (BSM). Since core algorithm of BSM is based on binary and integer computations, it has a higher computational efficiency than previous methods. Experimental results on Middlebury benchmark show that BSM has comparable performance with state-of-the-art local stereo methods in terms of both quality and speed. Furthermore, experiments on images with radiometric differences demonstrate that BSM is more robust than previous methods under these changes, which is common under real illumination.