Chenyang Yang

h-index47

25papers

791citations

Novelty47%

AI Score32

Ranked #124,472 of 194,257 authors (top 64%)#27,370 in LG (top 68%)

25 Papers

5.3LGJul 24, 2023

Learning Resource Allocation Policy: Vertex-GNN or Edge-GNN?

Yao Peng, Jia Guo, Chenyang Yang

Graph neural networks (GNNs) update the hidden representations of vertices (called Vertex-GNNs) or hidden representations of edges (called Edge-GNNs) by processing and pooling the information of neighboring vertices and edges and combining to exploit topology information. When learning resource allocation policies, GNNs cannot perform well if their expressive power is weak, i.e., if they cannot differentiate all input features such as channel matrices. In this paper, we analyze the expressive power of the Vertex-GNNs and Edge-GNNs for learning three representative wireless policies: link scheduling, power control, and precoding policies. We find that the expressive power of the GNNs depends on the linearity and output dimensions of the processing and combination functions. When linear processors are used, the Vertex-GNNs cannot differentiate all channel matrices due to the loss of channel information, while the Edge-GNNs can. When learning the precoding policy, even the Vertex-GNNs with non-linear processors may not be with strong expressive ability due to the dimension compression. We proceed to provide necessary conditions for the GNNs to well learn the precoding policy. Simulation results validate the analyses and show that the Edge-GNNs can achieve the same performance as the Vertex-GNNs with much lower training and inference time.

4.3SPDec 1, 2022

A Model-based GNN for Learning Precoding

Jia Guo, Chenyang Yang

Learning precoding policies with neural networks enables low complexity online implementation, robustness to channel impairments, and joint optimization with channel acquisition. However, existing neural networks suffer from high training complexity and poor generalization ability when they are used to learn to optimize precoding for mitigating multi-user interference. This impedes their use in practical systems where the number of users is time-varying. In this paper, we propose a graph neural network (GNN) to learn precoding policies by harnessing both the mathematical model and the property of the policies. We first show that a vanilla GNN cannot well-learn pseudo-inverse of channel matrix when the numbers of antennas and users are large, and is not generalizable to unseen numbers of users. Then, we design a GNN by resorting to the Taylor's expansion of matrix pseudo-inverse, which allows for capturing the importance of the neighbored edges to be aggregated that is crucial for learning precoding policies efficiently. Simulation results show that the proposed GNN can well learn spectral efficient and energy efficient precoding policies in single- and multi-cell multi-user multi-antenna systems with low training complexity, and can be well generalized to the numbers of users.

1.8LGMar 8, 2022

Designing Heterogeneous GNNs with Desired Permutation Properties for Wireless Resource Allocation

Jianyu Zhao, Chenyang Yang, Tingting Liu

Graph neural networks (GNNs) have been designed for learning a variety of wireless policies, i.e., the mappings from environment parameters to decision variables, thanks to their superior performance, and the potential in enabling scalability and size generalizability. These merits are rooted in leveraging permutation prior, i.e., satisfying the permutation property of the policy to be learned (referred to as desired permutation property). Many wireless policies are with complicated permutation properties. To satisfy these properties, heterogeneous GNNs (HetGNNs) should be used to learn such policies. There are two critical factors that enable a HetGNN to satisfy a desired permutation property: constructing an appropriate heterogeneous graph and judiciously designing the architecture of the HetGNN. However, both the graph and the HetGNN are designed heuristically so far. In this paper, we strive to provide a systematic approach for the design to satisfy the desired permutation property. We first propose a method for constructing a graph for a policy, where the edges and their types are defined for the sake of satisfying complicated permutation properties. Then, we provide and prove three sufficient conditions to design a HetGNN such that it can satisfy the desired permutation property when learning over an appropriate graph. These conditions suggest a method of designing the HetGNN with desired permutation property by sharing the processing, combining, and pooling functions according to the types of vertices and edges of the graph. We take power allocation and hybrid precoding policies as examples for demonstrating how to apply the proposed methods and validating the impact of the permutation prior by simulations.

1.2SPMar 12, 2025

Precoder Learning by Leveraging Unitary Equivariance Property

Yilun Ge, Shuyao Liao, Shengqian Han et al.

Incorporating mathematical properties of a wireless policy to be learned into the design of deep neural networks (DNNs) is effective for enhancing learning efficiency. Multi-user precoding policy in multi-antenna system, which is the mapping from channel matrix to precoding matrix, possesses a permutation equivariance property, which has been harnessed to design the parameter sharing structure of the weight matrix of DNNs. In this paper, we study a stronger property than permutation equivariance, namely unitary equivariance, for precoder learning. We first show that a DNN with unitary equivariance designed by further introducing parameter sharing into a permutation equivariant DNN is unable to learn the optimal precoder. We proceed to develop a novel non-linear weighting process satisfying unitary equivariance and then construct a joint unitary and permutation equivariant DNN. Simulation results demonstrate that the proposed DNN not only outperforms existing learning methods in learning performance and generalizability but also reduces training complexity.

1.2MMOct 20, 2021

FoV Privacy-aware VR Streaming

Xing Wei, Chenyang Yang

Proactive tile-based virtual reality (VR) video streaming can use the trace of FoV and eye movement to predict future requested tiles, then renders and delivers the predicted tiles before playback. The quality of experience (QoE) depends on the combined effect of tile prediction and consumed resources. Recently, it has been found that with the FoV and eye movement data collected for a user, one can infer the identity and preference of the user. Existing works investigate the privacy protection for eye movement, but never address how to protect the privacy in terms of FoV and how the privacy protection affects the QoE. In this paper, we strive to characterize and satisfy the FoV privacy requirement. We consider "trading resources for privacy". We first add camouflaged tile requests around the real FoV and define spatial degree of privacy (SDoP) as a normalized number of camouflaged tile requests. By consuming more resources to ensure SDoP, the real FoVs can be hidden. Then, we proceed to analyze the impacts of SDoP on the QoE by jointly optimizing the durations for prediction, computing, and transmission that maximizes the QoE given arbitrary predictor, configured resources, and SDoP. We find that a larger SDoP requires more resources but degrades the performance of tile prediction. Simulation with state-of-the-art predictors on a real dataset verifies the analysis and shows that a user requiring a larger SDoP can be served with better QoE.

2.3MMApr 29, 2021

Spatial Privacy-aware VR streaming

Xing Wei, Chenyang Yang

Proactive tile-based virtual reality (VR) video streaming employs the current tracking data of a user to predict future requested tiles, then renders and delivers the predicted tiles before playback. Very recently, privacy protection in proactive VR video streaming starts to raise concerns. However, existing privacy protection may fail even with privacy-preserve federated learning. This is because when the future requested tiles can be predicted accurately, the user-behavior-related data can still be recovered from the predicted tiles. In this paper, we consider how to protect privacy even with accurate predictors and investigate the impact of privacy requirement on the quality of experience (QoE). To this end, we first add extra \textit{camouflaged} tile requests to the real tile requests and model the privacy requirement as the \textit{spatial degree of privacy} (sDoP). By ensuring sDoP, the real tile requests can be hidden and privacy can be protected. Then, we jointly optimize the durations for prediction, computing, and transmitting, aimed at maximizing the privacy-aware QoE given arbitrary predictor and configured resources. From the obtained optimal closed-form solution, we find that the impacts of sDoP on the QoE are two sides of the same coin. On the one side the increase of sDoP improves the capability of communication and computing hence improves QoE. On the other side it degrades the prediction performance hence degrades the QoE. The overall impact depends on which factor dominates the QoE. Simulation with two predictors on a real dataset verifies the analysis and shows that the overall impact of sDoP is to improve the QoE.

3.3MMApr 20, 2021

Privacy-aware VR streaming

Xing Wei, Chenyang Yang

Proactive tile-based virtual reality (VR) video streaming employs the current tracking data of a user to predict future requested tiles, then renders and delivers the predicted tiles to be requested before playback. The quality of experience (QoE) depends on the overall performance of prediction, computing (i.e., rendering) and communication. All prior works neglect that users may have privacy requirement, i.e., not all the current tracking data are allowed to be uploaded. In this paper, we investigate the privacy-aware VR streaming. We first establish a dataset that collects the privacy requirement of 66 users among 18 panoramic videos. The dataset shows that the privacy requirements of 360$^{\circ}$ videos are heterogeneous. Only 41\% of the total watched videos have no privacy requirement. Based on these findings, we formulate the privacy requirement as the \textit{degree of privacy} (DoP), and investigate the impact of DoP on the proactive VR streaming. First, we find that with DoP, the length of the observation window and prediction window of a tile predictor should be variable. Then, we jointly optimize the durations for computing and transmitting the selected tiles as well as the computing and communication capability, aimed at maximizing the QoE given arbitrary predictor and configured resources. From the obtained optimal closed-form solution, we find a resource-saturated region where DoP has no impact on the QoE and a resource-unsaturated region where the two-fold impacts of DoP are contradictory. On the one hand, the increase of DoP will degrade the prediction performance and thus degrade the QoE. On the other hand, the increase of DoP will improve the capability of computing and communication and thus improve the QoE. Simulation results using two predictors and a real dataset validate the analysis and demonstrate the overall impact of DoP on the QoE.

1.2NIFeb 10, 2021

Deep Reinforcement Learning with Symmetric Prior for Predictive Power Allocation to Mobile Users

Jianyu Zhao, Chenyang Yang

Deep reinforcement learning has been applied for a variety of wireless tasks, which is however known with high training and inference complexity. In this paper, we resort to deep deterministic policy gradient (DDPG) algorithm to optimize predictive power allocation among K mobile users requesting video streaming, which minimizes the energy consumption of the network under the no-stalling constraint of each user. To reduce the sampling complexity and model size of the DDPG, we exploit a kind of symmetric prior inherent in the actor and critic networks: permutation invariant and equivariant properties, to design the neural networks. Our analysis shows that the free model parameters of the DDPG can be compressed by 2/K^2. Simulation results demonstrate that the episodes required by the learning model with the symmetric prior to achieve the same performance as the vanilla policy reduces by about one third when K = 10.

4.3MMJan 3, 2021

Duration-Squeezing-Aware Communication and Computing for Proactive VR

Xing Wei, Chenyang Yang, Shengqian Han

Proactive tile-based virtual reality video streaming computes and delivers the predicted tiles to be requested before playback. All existing works overlook the important fact that computing and communication (CC) tasks for a segment may squeeze the time for the tasks for the next segment, which will cause less and less available time for the latter segments. In this paper, we jointly optimize the durations for CC tasks to maximize the completion rate of CC tasks under the task duration-squeezing-aware constraint. To ensure the latter segments remain enough time for the tasks, the CC tasks for a segment are not allowed to squeeze the time for computing and delivering the subsequent segment. We find the closed-form optimal solution, from which we find a minimum-resource-limited, an unconditional and a conditional resource-tradeoff regions, which are determined by the total time for proactive CC tasks and the playback duration of a segment. Owing to the duration-squeezing-prohibited constraints, the increase of the configured resources may not be always useful for improving the completion rate of CC tasks. Numerical results validate the impact of the duration-squeezing-prohibited constraints and illustrate the three regions.

4.2LGNov 6, 2020

Learning Power Control for Cellular Systems with Heterogeneous Graph Neural Network

Jia Guo, Chenyang Yang

Optimizing power control in multi-cell cellular networks with deep learning enables such a non-convex problem to be implemented in real-time. When channels are time-varying, the deep neural networks (DNNs) need to be re-trained frequently, which calls for low training complexity. To reduce the number of training samples and the size of DNN required to achieve good performance, a promising approach is to embed the DNNs with priori knowledge. Since cellular networks can be modelled as a graph, it is natural to employ graph neural networks (GNNs) for learning, which exhibit permutation invariance (PI) and equivalence (PE) properties. Unlike the homogeneous GNNs that have been used for wireless problems, whose outputs are invariant or equivalent to arbitrary permutations of vertexes, heterogeneous GNNs (HetGNNs), which are more appropriate to model cellular networks, are only invariant or equivalent to some permutations. If the PI or PE properties of the HetGNN do not match the property of the task to be learned, the performance degrades dramatically. In this paper, we show that the power control policy has a combination of different PI and PE properties, and existing HetGNN does not satisfy these properties. We then design a parameter sharing scheme for HetGNN such that the learned relationship satisfies the desired properties. Simulation results show that the sample complexity and the size of designed GNN for learning the optimal power control policy in multi-user multi-cell networks are much lower than the existing DNNs, when achieving the same sum rate loss from the numerically obtained solutions.

14.5SPSep 13, 2020

A Tutorial on Ultra-Reliable and Low-Latency Communications in 6G: Integrating Domain Knowledge into Deep Learning

Changyang She, Chengjian Sun, Zhouyou Gu et al.

As one of the key communication scenarios in the 5th and also the 6th generation (6G) of mobile communication networks, ultra-reliable and low-latency communications (URLLC) will be central for the development of various emerging mission-critical applications. State-of-the-art mobile communication systems do not fulfill the end-to-end delay and overall reliability requirements of URLLC. In particular, a holistic framework that takes into account latency, reliability, availability, scalability, and decision making under uncertainty is lacking. Driven by recent breakthroughs in deep neural networks, deep learning algorithms have been considered as promising ways of developing enabling technologies for URLLC in future 6G networks. This tutorial illustrates how domain knowledge (models, analytical tools, and optimization frameworks) of communications and networking can be integrated into different kinds of deep learning algorithms for URLLC. We first provide some background of URLLC and review promising network architectures and deep learning frameworks for 6G. To better illustrate how to improve learning algorithms with domain knowledge, we revisit model-based analytical tools and cross-layer optimization frameworks for URLLC. Following that, we examine the potential of applying supervised/unsupervised deep learning and deep reinforcement learning in URLLC and summarize related open problems. Finally, we provide simulation and experimental results to validate the effectiveness of different learning algorithms and discuss future directions.

5.1ITMay 30, 2020

Unsupervised Deep Learning for Optimizing Wireless Systems with Instantaneous and Statistic Constraints

Chengjian Sun, Changyang She, Chenyang Yang

Deep neural networks (DNNs) have been introduced for designing wireless policies by approximating the mappings from environmental parameters to solutions of optimization problems. Considering that labeled training samples are hard to obtain, unsupervised deep learning has been proposed to solve functional optimization problems with statistical constraints recently. However, most existing problems in wireless communications are variable optimizations, and many problems are with instantaneous constraints. In this paper, we establish a unified framework of using unsupervised deep learning to solve both kinds of problems with both instantaneous and statistic constraints. For a constrained variable optimization, we first convert it into an equivalent functional optimization problem with instantaneous constraints. Then, to ensure the instantaneous constraints in the functional optimization problems, we use DNN to approximate the Lagrange multiplier functions, which is trained together with a DNN to approximate the policy. We take two resource allocation problems in ultra-reliable and low-latency communications as examples to illustrate how to guarantee the complex and stringent quality-of-service (QoS) constraints with the framework. Simulation results show that unsupervised learning outperforms supervised learning in terms of QoS violation probability and approximation accuracy of the optimal policy, and can converge rapidly with pre-training.

7.3SPMay 18, 2020

Improving Learning Efficiency for Wireless Resource Allocation with Symmetric Prior

Chengjian Sun, Jiajun Wu, Chenyang Yang

Improving learning efficiency is paramount for learning resource allocation with deep neural networks (DNNs) in wireless communications over highly dynamic environments. Incorporating domain knowledge into learning is a promising way of dealing with this issue, which is an emerging topic in the wireless community. In this article, we first briefly summarize two classes of approaches to using domain knowledge: introducing mathematical models or prior knowledge to deep learning. Then, we consider a kind of symmetric prior, permutation equivariance, which widely exists in wireless tasks. To explain how such a generic prior is harnessed to improve learning efficiency, we resort to ranking, which jointly sorts the input and output of a DNN. We use power allocation among subcarriers, probabilistic content caching, and interference coordination to illustrate the improvement of learning efficiency by exploiting the property. From the case study, we find that the required training samples to achieve given system performance decreases with the number of subcarriers or contents, owing to an interesting phenomenon: "sample hardening". Simulation results show that the training samples, the free parameters in DNNs and the training time can be reduced dramatically by harnessing the prior knowledge. The samples required to train a DNN after ranking can be reduced by $15 \sim 2,400$ folds to achieve the same system performance as the counterpart without using prior.

2.3LGMar 21, 2020

Accelerating Deep Reinforcement Learning With the Aid of Partial Model: Energy-Efficient Predictive Video Streaming

Dong Liu, Jianyu Zhao, Chenyang Yang et al.

Predictive power allocation is conceived for energy-efficient video streaming over mobile networks using deep reinforcement learning. The goal is to minimize the accumulated energy consumption of each base station over a complete video streaming session under the constraint that avoids video playback interruptions. To handle the continuous state and action spaces, we resort to deep deterministic policy gradient (DDPG) algorithm for solving the formulated problem. In contrast to previous predictive power allocation policies that first predict future information with historical data and then optimize the power allocation based on the predicted information, the proposed policy operates in an on-line and end-to-end manner. By judiciously designing the action and state that only depend on slowly-varying average channel gains, we reduce the signaling overhead between the edge server and the base stations, and make it easier to learn a good policy. To further avoid playback interruption throughout the learning process and improve the convergence speed, we exploit the partially known model of the system dynamics by integrating the concepts of safety layer, post-decision state, and virtual experiences into the basic DDPG algorithm. Our simulation results show that the proposed policies converge to the optimal policy that is derived based on perfect large-scale channel prediction and outperform the first-predict-then-optimize policy in the presence of prediction errors. By harnessing the partially known model, the convergence speed can be dramatically improved.

11.7SPFeb 22, 2020

Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks

Changyang She, Rui Dong, Zhouyou Gu et al.

In the future 6th generation networks, ultra-reliable and low-latency communications (URLLC) will lay the foundation for emerging mission-critical applications that have stringent requirements on end-to-end delay and reliability. Existing works on URLLC are mainly based on theoretical models and assumptions. The model-based solutions provide useful insights, but cannot be directly implemented in practice. In this article, we first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC, and discuss some open problems of these methods. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC. The basic idea is to merge theoretical models and real-world data in analyzing the latency and reliability and training deep neural networks (DNNs). Deep transfer learning is adopted in the architecture to fine-tune the pre-trained DNNs in non-stationary networks. Further considering that the computing capacity at each user and each mobile edge computing server is limited, federated learning is applied to improve the learning efficiency. Finally, we provide some experimental and simulation results and discuss some future directions.

4.2LGJan 29, 2020

Constructing Deep Neural Networks with a Priori Knowledge of Wireless Tasks

Jia Guo, Chenyang Yang

Deep neural networks (DNNs) have been employed for designing wireless systems in many aspects, say transceiver design, resource optimization, and information prediction. Existing works either use the fully-connected DNN or the DNNs with particular architectures developed in other domains. While generating labels for supervised learning and gathering training samples are time-consuming or cost-prohibitive, how to develop DNNs with wireless priors for reducing training complexity remains open. In this paper, we show that two kinds of permutation invariant properties widely existed in wireless tasks can be harnessed to reduce the number of model parameters and hence the sample and computational complexity for training. We find special architecture of DNNs whose input-output relationships satisfy the properties, called permutation invariant DNN (PINN), and augment the data with the properties. By learning the impact of the scale of a wireless system, the size of the constructed PINNs can flexibly adapt to the input data dimension. We take predictive resource allocation and interference coordination as examples to show how the PINNs can be employed for learning the optimal policy with unsupervised and supervised learning. Simulations results demonstrate a dramatic gain of the proposed PINNs in terms of reducing training complexity.

10.1LGJan 3, 2020

Optimizing Wireless Systems Using Unsupervised and Reinforced-Unsupervised Deep Learning

Dong Liu, Chengjian Sun, Chenyang Yang et al.

Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems subject to specific constraints, which can be formulated as variable or functional optimization. If the objective and constraint functions of a variable optimization problem can be derived, standard numerical algorithms can be applied for finding the optimal solution, which however incur high computational cost when the dimension of the variable is high. To reduce the on-line computational complexity, learning the optimal solution as a function of the environment's status by deep neural networks (DNNs) is an effective approach. DNNs can be trained under the supervision of optimal solutions, which however, is not applicable to the scenarios without models or for functional optimization where the optimal solutions are hard to obtain. If the objective and constraint functions are unavailable, reinforcement learning can be applied to find the solution of a functional optimization problem, which is however not tailored to optimization problems in wireless networks. In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems without the supervision of the optimal solutions. When the mathematical model of the environment is completely known and the distribution of environment's status is known or unknown, we can invoke unsupervised learning algorithm. When the mathematical model of the environment is incomplete, we introduce reinforced-unsupervised learning algorithms that learn the model by interacting with the environment. Our simulation results confirm the applicability of these learning frameworks by taking a user association problem as an example.

6.6ITOct 30, 2019Code

Prediction, Communication, and Computing Duration Optimization for VR Video Streaming

Xing Wei, Chenyang Yang, Shengqian Han

Proactive tile-based video streaming can avoid motion-to-photon latency of wireless virtual reality (VR) by computing and delivering the predicted tiles to be requested before playback. All existing works either focus on designing predictors or allocating computing and communications resources. Yet to avoid the latency, the successively executed prediction, communication, and computing tasks should be accomplished within a predetermined time. Moreover, the quality of experience (QoE) of proactive VR streaming depends on the worst performance of the three tasks. In this paper, we jointly optimize the duration of the observation window for predicting tiles and the durations for computing and transmitting the predicted tiles, aimed at balancing the performance for three tasks to maximize the QoE given arbitrary predictor and configured resources. We obtain the closed-form optimal solution by decomposing the formulated problem equivalently into two subproblems. With the optimized durations, we find a resource-limited region where the QoE increases rapidly with configured resources, and a prediction-limited region where the QoE can be improved more efficiently with a better predictor. Simulation results using three existing predictors and a real dataset validate the analysis and demonstrate the gain from the joint optimization over non-optimized counterparts.

6.0LGOct 30, 2019

Structure of Deep Neural Networks with a Priori Information in Wireless Tasks

Jia Guo, Chenyang Yang

Deep neural networks (DNNs) have been employed for designing wireless networks in many aspects, such as transceiver optimization, resource allocation, and information prediction. Existing works either use fully-connected DNN or the DNNs with specific structures that are designed in other domains. In this paper, we show that a priori information widely existed in wireless tasks is permutation invariant. For these tasks, we propose a DNN with special structure, where the weight matrices between layers of the DNN only consist of two smaller sub-matrices. By such way of parameter sharing, the number of model parameters reduces, giving rise to low sample and computational complexity for training a DNN. We take predictive resource allocation as an example to show how the designed DNN can be applied for learning the optimal policy with unsupervised learning. Simulations results validate our analysis and show dramatic gain of the proposed structure in terms of reducing training complexity.

3.3SYOct 29, 2019

Proactive Optimization with Machine Learning: Femto-caching with Future Content Popularity

Jiajun Wu, Chengjian Sun, Chenyang Yang

Optimizing resource allocation with predicted information has shown promising gain in boosting network performance and improving user experience. Earlier research efforts focus on optimizing proactive policies under the assumption of knowing the future information. Recently, various techniques have been proposed to predict the required information, and the prediction results were then treated as the true value in the optimization, i.e., "first-predict-then-optimize". In this paper, we introduce a proactive optimization framework for anticipatory resource allocation, where the future information is implicitly predicted under the same objective with the policy optimization in a single step. An optimization problem is formulated to integrate the implicit prediction and the policy optimization, based on the conditional distribution of the future information given the historical observations. To solve such a problem, we transform it equivalently to a problem depending on the joint distribution of future and historical information. Then, we resort to unsupervised learning with neural networks to learn the proactive policy as a function of the past observations via stochastic optimization. We take proactive caching and bandwidth allocation at base stations as a concrete example, where the objective function is the conditional expectation of successful offloading probability taken over the future popularity given the historically observed popularity. We use simulation to validate the proposed framework and compare it with the "first-predict-then-optimize" strategy and a heuristic "end-to-end" optimization strategy with supervised learning.

1.8LGJul 30, 2019

Model-Free Unsupervised Learning for Optimization Problems with Constraints

Chengjian Sun, Dong Liu, Chenyang Yang

In many optimization problems in wireless communications, the expressions of objective function or constraints are hard or even impossible to derive, which makes the solutions difficult to find. In this paper, we propose a model-free learning framework to solve constrained optimization problems without the supervision of the optimal solution. Neural networks are used respectively for parameterizing the function to be optimized, parameterizing the Lagrange multiplier associated with instantaneous constraints, and approximating the unknown objective function or constraints. We provide learning algorithms to train all the neural networks simultaneously, and reveal the connections of the proposed framework with reinforcement learning. Numerical and simulation results validate the proposed framework and demonstrate the efficiency of model-free learning by taking power control problem as an example.

7.1LGMay 27, 2019

Learning to Optimize with Unsupervised Learning: Training Deep Neural Networks for URLLC

Chengjian Sun, Chenyang Yang

Learning the optimized solution as a function of environmental parameters is effective in solving numerical optimization in real time for time-sensitive applications. Existing works of learning to optimize train deep neural networks (DNN) with labels, and the learnt solution are inaccurate, which cannot be employed to ensure the stringent quality of service. In this paper, we propose a framework to learn the latent function with unsupervised deep learning, where the property that the optimal solution should satisfy is used as the "supervision signal" implicitly. The framework is applicable to both functional and variable optimization problems with constraints. We take a variable optimization problem in ultra-reliable and low-latency communications as an example, which demonstrates that the ultra-high reliability can be supported by the DNN without supervision labels.

5.1NIApr 26, 2019

Unsupervised Deep Learning for Ultra-reliable and Low-latency Communications

Chengjian Sun, Chenyang Yang

In this paper, we study how to solve resource allocation problems in ultra-reliable and low-latency communications by unsupervised deep learning, which often yield functional optimization problems with quality-of-service (QoS) constraints. We take a joint power and bandwidth allocation problem as an example, which minimizes the total bandwidth required to guarantee the QoS of each user in terms of the delay bound and overall packet loss probability. The global optimal solution is found in a symmetric scenario. A neural network was introduced to find an approximated optimal solution in general scenarios, where the QoS is ensured by using the property that the optimal solution should satisfy as the "supervision signal". Simulation results show that the learning-based solution performs the same as the optimal solution in the symmetric scenario, and can save around 40% bandwidth with respect to the state-of-the-art policy.

19.5NIMay 17, 2018

Deep Reinforcement Learning for Resource Management in Network Slicing

Rongpeng Li, Zhifeng Zhao, Qi Sun et al.

Network slicing is born as an emerging business to operators, by allowing them to sell the customized slices to various tenants at different prices. In order to provide better-performing and cost-efficient services, network slicing involves challenging technical issues and urgently looks forward to intelligent innovations to make the resource management consistent with users' activities per slice. In that regard, deep reinforcement learning (DRL), which focuses on how to interact with the environment by trying alternative actions and reinforcing the tendency actions producing more rewarding consequences, is assumed to be a promising solution. In this paper, after briefly reviewing the fundamental concepts of DRL, we investigate the application of DRL in solving some typical resource management for network slicing scenarios, which include radio resource slicing and priority-based core network slicing, and demonstrate the advantage of DRL over several competing schemes through extensive simulations. Finally, we also discuss the possible challenges to apply DRL in network slicing from a general perspective.

3.3NIJan 22, 2018

A Learning-based Approach to Joint Content Caching and Recommendation at Base Stations

Dong Liu, Chenyang Yang

Recommendation system is able to shape user demands, which can be used for boosting caching gain. In this paper, we jointly optimize content caching and recommendation at base stations to maximize the caching gain meanwhile not compromising the user preference. We first propose a model to capture the impact of recommendation on user demands, which is controlled by a user-specific psychological threshold. We then formulate a joint caching and recommendation problem maximizing the successful offloading probability, which is a mixed integer programming problem. We develop a hierarchical iterative algorithm to solve the problem when the threshold is known. Since the user threshold is unknown in practice, we proceed to propose an $\varepsilon$-greedy algorithm to find the solution by learning the threshold via interactions with users. Simulation results show that the proposed algorithms improve the successful offloading probability compared with prior works with/without recommendation. The $\varepsilon$-greedy algorithm learns the user threshold quickly, and achieves more than $1-\varepsilon$ of the performance obtained by the algorithm with known threshold.