SYOct 15, 2023Code
BONES: Near-Optimal Neural-Enhanced Video StreamingLingdong Wang, Simran Singh, Jacob Chakareski et al.
Accessing high-quality video content can be challenging due to insufficient and unstable network bandwidth. Recent advances in neural enhancement have shown promising results in improving the quality of degraded videos through deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach into video streaming, allowing users to download low-quality video segments and then enhance them to obtain high-quality content without violating the playback of the video stream. We introduce BONES, an NES control algorithm that jointly manages the network and computational resources to maximize the quality of experience (QoE) of the user. BONES formulates NES as a Lyapunov optimization problem and solves it in an online manner with near-optimal performance, making it the first NES algorithm to provide a theoretical performance guarantee. Comprehensive experimental results indicate that BONES increases QoE by 5\% to 20\% over state-of-the-art algorithms with minimal overhead. Our code is available at https://github.com/UMass-LIDS/bones.
CVSep 12, 2022Code
CU-Net: Real-Time High-Fidelity Color Upsampling for Point CloudsLingdong Wang, Mohammad Hajiesmaili, Jacob Chakareski et al.
Point cloud upsampling is essential for high-quality augmented reality, virtual reality, and telepresence applications, due to the capture, processing, and communication limitations of existing technologies. Although geometry upsampling to densify a point cloud's coordinates has been well studied, the upsampling of the color attributes has been largely overlooked. In this paper, we propose CU-Net, the first deep-learning point cloud color upsampling model that enables low latency and high visual fidelity operation. CU-Net achieves linear time and space complexity by leveraging a feature extractor based on sparse convolution and a color prediction module based on neural implicit function. Therefore, CU-Net is theoretically guaranteed to be more efficient than most existing methods with quadratic complexity. Experimental results demonstrate that CU-Net can colorize a photo-realistic point cloud with nearly a million points in real time, while having notably better visual performance than baselines. Besides, CU-Net can adapt to arbitrary upsampling ratios and unseen objects without retraining. Our source code is available at https://github.com/UMass-LIDS/cunet.
NIJul 3, 2024
Multi-Task Decision-Making for Multi-User 360 Video Processing over Wireless NetworksBabak Badnava, Jacob Chakareski, Morteza Hashemi
We study a multi-task decision-making problem for 360 video processing in a wireless multi-user virtual reality (VR) system that includes an edge computing unit (ECU) to deliver 360 videos to VR users and offer computing assistance for decoding/rendering of video frames. However, this comes at the expense of increased data volume and required bandwidth. To balance this trade-off, we formulate a constrained quality of experience (QoE) maximization problem in which the rebuffering time and quality variation between video frames are bounded by user and video requirements. To solve the formulated multi-user QoE maximization, we leverage deep reinforcement learning (DRL) for multi-task rate adaptation and computation distribution (MTRC). The proposed MTRC approach does not rely on any predefined assumption about the environment and relies on video playback statistics (i.e., past throughput, decoding time, transmission time, etc.), video information, and the resulting performance to adjust the video bitrate and computation distribution. We train MTRC with real-world wireless network traces and 360 video datasets to obtain evaluation results in terms of the average QoE, peak signal-to-noise ratio (PSNR), rebuffering time, and quality variation. Our results indicate that the MTRC improves the users' QoE compared to state-of-the-art rate adaptation algorithm. Specifically, we show a 5.97 dB to 6.44 dB improvement in PSNR, a 1.66X to 4.23X improvement in rebuffering time, and a 4.21 dB to 4.35 dB improvement in quality variation.
ITDec 13, 2025
ElasticVR: Elastic Task Computing in Multi-User Multi-Connectivity Wireless Virtual Reality (VR) SystemsBabak Badnava, Jacob Chakareski, Morteza Hashemi
Diverse emerging VR applications integrate streaming of high fidelity 360 video content that requires ample amounts of computation and data rate. Scalable 360 video tiling enables having elastic VR computational tasks that can be scaled adaptively in computation and data rate based on the available user and system resources. We integrate scalable 360 video tiling in an edge-client wireless multi-connectivity architecture for joint elastic task computation offloading across multiple VR users called ElasticVR. To balance the trade-offs in communication, computation, energy consumption, and QoE that arise herein, we formulate a constrained QoE and energy optimization problem that integrates the multi-user/multi-connectivity action space with the elasticity of VR computational tasks. The ElasticVR framework introduces two multi-agent deep reinforcement learning solutions, namely CPPG and IPPG. CPPG adopts a centralized training and centralized execution approach to capture the coupling between users' communication and computational demands. This leads to globally coordinated decisions at the cost of increased computational overheads and limited scalability. To address the latter challenges, we also explore an alternative strategy denoted IPPG that adopts a centralized training with decentralized execution paradigm. IPPG leverages shared information and parameter sharing to learn robust policies; however, during execution, each user takes action independently based on its local state information only. The decentralized execution alleviates the communication and computation overhead of centralized decision-making and improves scalability. We show that the ElasticVR framework improves the PSNR by 43.21%, while reducing the response time and energy consumption by 42.35% and 56.83%, respectively, compared with a case where no elasticity is incorporated into VR computations.
DCOct 27, 2025
Bayes-Split-Edge: Bayesian Optimization for Constrained Collaborative Inference in Wireless Edge SystemsFatemeh Zahra Safaeipour, Jacob Chakareski, Morteza Hashemi
Mobile edge devices (e.g., AR/VR headsets) typically need to complete timely inference tasks while operating with limited on-board computing and energy resources. In this paper, we investigate the problem of collaborative inference in wireless edge networks, where energy-constrained edge devices aim to complete inference tasks within given deadlines. These tasks are carried out using neural networks, and the edge device seeks to optimize inference performance under energy and delay constraints. The inference process can be split between the edge device and an edge server, thereby achieving collaborative inference over wireless networks. We formulate an inference utility optimization problem subject to energy and delay constraints, and propose a novel solution called Bayes-Split-Edge, which leverages Bayesian optimization for collaborative split inference over wireless edge networks. Our solution jointly optimizes the transmission power and the neural network split point. The Bayes-Split-Edge framework incorporates a novel hybrid acquisition function that balances inference task utility, sample efficiency, and constraint violation penalties. We evaluate our approach using the VGG19 model on the ImageNet-Mini dataset, and Resnet101 on Tiny-ImageNet, and real-world mMobile wireless channel datasets. Numerical results demonstrate that Bayes-Split-Edge achieves up to 2.4x reduction in evaluation cost compared to standard Bayesian optimization and achieves near-linear convergence. It also outperforms several baselines, including CMA-ES, DIRECT, exhaustive search, and Proximal Policy Optimization (PPO), while matching exhaustive search performance under tight constraints. These results confirm that the proposed framework provides a sample-efficient solution requiring maximum 20 function evaluations and constraint-aware optimization for wireless split inference in edge computing systems.
DCSep 29, 2025
Graph Theory Meets Federated Learning over Satellite Constellations: Spanning Aggregations, Network Formation, and Performance OptimizationFardis Nadimi, Payam Abdisarabshali, Jacob Chakareski et al.
We introduce Fed-Span, a novel federated/distributed learning framework designed for low Earth orbit satellite constellations. Fed-Span aims to address critical challenges inherent to distributed learning in dynamic satellite networks, including intermittent satellite connectivity, heterogeneous computational capabilities of satellites, and time-varying satellites' datasets. At its core, Fed-Span leverages minimum spanning tree (MST) and minimum spanning forest (MSF) topologies to introduce spanning model aggregation and dispatching processes for distributed learning. To formalize Fed-Span, we offer a fresh perspective on MST/MSF topologies by formulating them through a set of continuous constraint representations (CCRs), thereby devising graph-theoretical abstractions into an optimizable framework for satellite networks. Using these CCRs, we obtain the energy consumption and latency of operations in Fed-Span. Moreover, we derive novel convergence bounds for Fed-Span, accommodating its key system characteristics and degrees of freedom (i.e., tunable parameters). Finally, we propose a comprehensive optimization problem that jointly minimizes model prediction loss, energy consumption, and latency of Fed-Span. We unveil that this problem is NP-hard and develop a systematic approach to transform it into a geometric programming formulation, solved via successive convex optimization with performance guarantees. Through evaluations on real-world datasets, we demonstrate that Fed-Span outperforms existing methods, with faster model convergence, greater energy efficiency, and reduced latency. These results highlight Fed-Span as a novel solution for efficient distributed learning in satellite networks.
NISep 7, 2025
ASL360: AI-Enabled Adaptive Streaming of Layered 360° Video over UAV-assisted Wireless NetworksAlireza Mohammadhosseini, Jacob Chakareski, Nicholas Mastronarde
We propose ASL360, an adaptive deep reinforcement learning-based scheduler for on-demand 360° video streaming to mobile VR users in next generation wireless networks. We aim to maximize the overall Quality of Experience (QoE) of the users served over a UAV-assisted 5G wireless network. Our system model comprises a macro base station (MBS) and a UAV-mounted base station which both deploy mm-Wave transmission to the users. The 360° video is encoded into dependent layers and segmented tiles, allowing a user to schedule downloads of each layer's segments. Furthermore, each user utilizes multiple buffers to store the corresponding video layer's segments. We model the scheduling decision as a Constrained Markov Decision Process (CMDP), where the agent selects Base or Enhancement layers to maximize the QoE and use a policy gradient-based method (PPO) to find the optimal policy. Additionally, we implement a dynamic adjustment mechanism for cost components, allowing the system to adaptively balance and prioritize the video quality, buffer occupancy, and quality change based on real-time network and streaming session conditions. We demonstrate that ASL360 significantly improves the QoE, achieving approximately 2 dB higher average video quality, 80% lower average rebuffering time, and 57% lower video quality variation, relative to competitive baseline methods. Our results show the effectiveness of our layered and adaptive approach in enhancing the QoE in immersive videostreaming applications, particularly in dynamic and challenging network environments.
NIJul 22, 2018
Accelerated Structure-Aware Reinforcement Learning for Delay-Sensitive Energy Harvesting Wireless SensorsNikhilesh Sharma, Nicholas Mastronarde, Jacob Chakareski
We investigate an energy-harvesting wireless sensor transmitting latency-sensitive data over a fading channel. The sensor injects captured data packets into its transmission queue and relies on ambient energy harvested from the environment to transmit them. We aim to find the optimal scheduling policy that decides whether or not to transmit the queue's head-of-line packet at each transmission opportunity such that the expected packet queuing delay is minimized given the available harvested energy. No prior knowledge of the stochastic processes that govern the channel, captured data, or harvested energy dynamics are assumed, thereby necessitating the use of online learning to optimize the scheduling policy. We formulate this scheduling problem as a Markov decision process (MDP) and analyze the structural properties of its optimal value function. In particular, we show that it is non-decreasing and has increasing differences in the queue backlog and that it is non-increasing and has increasing differences in the battery state. We exploit this structure to formulate a novel accelerated reinforcement learning (RL) algorithm to solve the scheduling problem online at a much faster learning rate, while limiting the induced computational complexity. Our experiments demonstrate that the proposed algorithm closely approximates the performance of an optimal offline solution that requires a priori knowledge of the channel, captured data, and harvested energy dynamics. Simultaneously, by leveraging the value function's structure, our approach achieves competitive performance relative to a state-of-the-art RL algorithm, at potentially orders of magnitude lower complexity. Finally, considerable performance gains are demonstrated over the well-known and widely used Q-learning algorithm.
MMMar 21, 2018
Viewport-Driven Rate-Distortion Optimized 360° Video StreamingJacob Chakareski, Ridvan Aksu, Xavier Corbillon et al.
The growing popularity of virtual and augmented reality communications and 360° video streaming is moving video communication systems into much more dynamic and resource-limited operating settings. The enormous data volume of 360° videos requires an efficient use of network bandwidth to maintain the desired quality of experience for the end user. To this end, we propose a framework for viewport-driven rate-distortion optimized 360° video streaming that integrates the user view navigation pattern and the spatiotemporal rate-distortion characteristics of the 360° video content to maximize the delivered user quality of experience for the given network/system resources. The framework comprises a methodology for constructing dynamic heat maps that capture the likelihood of navigating different spatial segments of a 360° video over time by the user, an analysis and characterization of its spatiotemporal rate-distortion characteristics that leverage preprocessed spatial tilling of the 360° view sphere, and an optimization problem formulation that characterizes the delivered user quality of experience given the user navigation patterns, 360° video encoding decisions, and the available system/network resources. Our experimental results demonstrate the advantages of our framework over the conventional approach of streaming a monolithic uniformly encoded 360° video and a state-of-the-art reference method. Considerable video quality gains of 4 - 5 dB are demonstrated in the case of two popular 4K 360° videos.
MMSep 26, 2016
Viewport-Adaptive Navigable 360-Degree Video DeliveryXavier Corbillon, Gwendal Simon, Alisa Devlic et al.
The delivery and display of 360-degree videos on Head-Mounted Displays (HMDs) presents many technical challenges. 360-degree videos are ultra high resolution spherical videos, which contain an omnidirectional view of the scene. However only a portion of this scene is displayed on the HMD. Moreover, HMD need to respond in 10 ms to head movements, which prevents the server to send only the displayed video part based on client feedback. To reduce the bandwidth waste, while still providing an immersive experience, a viewport-adaptive 360-degree video streaming system is proposed. The server prepares multiple video representations, which differ not only by their bit-rate, but also by the qualities of different scene regions. The client chooses a representation for the next segment such that its bit-rate fits the available throughput and a full quality region matches its viewing. We investigate the impact of various spherical-to-plane projections and quality arrangements on the video quality displayed to the user, showing that the cube map layout offers the best quality for the given bit-rate budget. An evaluation with a dataset of users navigating 360-degree videos demonstrates that segments need to be short enough to enable frequent view switches.
CVMay 7, 2016
Matrix Factorization-Based Clustering Of Image Features For Bandwidth-Constrained Information RetrievalJacob Chakareski, Immanuel Manohar, Shantanu Rane
We consider the problem of accurately and efficiently querying a remote server to retrieve information about images captured by a mobile device. In addition to reduced transmission overhead and computational complexity, the retrieval protocol should be robust to variations in the image acquisition process, such as translation, rotation, scaling, and sensor-related differences. We propose to extract scale-invariant image features and then perform clustering to reduce the number of features needed for image matching. Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF) are investigated as candidate clustering approaches. The image matching complexity at the database server is quadratic in the (small) number of clusters, not in the (very large) number of image features. We employ an image-dependent information content metric to approximate the model order, i.e., the number of clusters, needed for accurate matching, which is preferable to setting the model order using trial and error. We show how to combine the hypotheses provided by PCA and NMF factor loadings, thereby obtaining more accurate retrieval than using either approach alone. In experiments on a database of urban images, we obtain a top-1 retrieval accuracy of 89% and a top-3 accuracy of 92.5%.
SENov 17, 2013
ComReg: A Complex Network Approach to Prioritize Test Cases for Regression TestingImrul Kayes, Jacob Chakareski
Regression testing is performed to provide confidence that changes in a part of software do not affect other parts of the software. An execution of all existing test cases is the best way to re-establish this confidence. However, regression testing is an expensive process---there might be insufficient resources (e.g., time, workforce) to allow for the re-execution of all test cases. Regression test prioritization techniques attempt to re-order a regression test suite based on some criteria so that highest priority test cases are executed earlier. In this study, we want to prioritize test cases for regression testing based on the dependency network of faults. In software testing, it is common that some faults are consequences of other faults (leading faults). Moreover, dependent faults can be removed if and only if the leading faults have been removed. Our goal is to prioritize test cases so that test cases that exposed leading faults (the most central faults in the fault dependency network) in the system testing phase, are executed first in regression testing. We present ComReg, a test case prioritization technique based on the dependency network of faults. We model a fault dependency network as a directed graph and identify leading faults to prioritize test cases for regression testing. We use a centrality aggregation technique which considers six network representative centrality metrics to identify leading faults in the fault dependency network. We also discuss the use of fault communities to select an arbitrary percentage of the test cases from a prioritized regression test suite. We conduct a case study that evaluates the effectiveness and applicability of the proposed method.
SEOct 9, 2013
Product Backlog Rating: A Case Study On Measuring Test Quality In ScrumImrul Kayes, Mithun Sarker, Jacob Chakareski
Agile software development methodologies focus on software projects which are behind schedule or highly likely to have a problematic development phase. In the last decade, Agile methods have transformed from cult techniques to mainstream methodologies. Scrum, an Agile software development method, has been widely adopted due to its adaptive nature. This paper presents a metric that measures the quality of the testing process in a Scrum process. As product quality and process quality correlate, improved test quality can ensure high quality products. Also, gaining experience from eight years of successful Scrum implementation at SoftwarePeople, we describe the Scrum process emphasizing the testing process. We propose a metric Product Backlog Rating (PBR) to assess the testing process in Scrum. PBR considers the complexity of the features to be developed in an iteration of Scrum, assesses test ratings and offers a numerical score of the testing process. This metric is able to provide a comprehensive overview of the testing process over the development cycle of a product. We present a case study which shows how the metric is used at SoftwarePeople. The case study explains some features that have been developed in a Sprint in terms of feature complexity and potential test assessment difficulties and shows how PBR is calculated during the Sprint. We propose a test process assessment metric that provides insights into the Scrum testing process. However, the metric needs further evaluation considering associated resources (e.g., quality assurance engineers, the length of the Scrum cycle).
MMJan 2, 2013
A Poisson Hidden Markov Model for Multiview Video TrafficLorenzo Rossi, Jacob Chakareski, Pascal Frossard et al.
Multiview video has recently emerged as a means to improve user experience in novel multimedia services. We propose a new stochastic model to characterize the traffic generated by a Multiview Video Coding (MVC) variable bit rate source. To this aim, we resort to a Poisson Hidden Markov Model (P-HMM), in which the first (hidden) layer represents the evolution of the video activity and the second layer represents the frame sizes of the multiple encoded views. We propose a method for estimating the model parameters in long MVC sequences. We then present extensive numerical simulations assessing the model's ability to produce traffic with realistic characteristics for a general class of MVC sequences. We then extend our framework to network applications where we show that our model is able to accurately describe the sender and receiver buffers behavior in MVC transmission. Finally, we derive a model of user behavior for interactive view selection, which, in conjunction with our traffic model, is able to accurately predict actual network load in interactive multiview services.