Mark Shifrin

2papers

2 Papers

ITFeb 8, 2017
Optimal Dynamic Routing for the Wireless Relay Channel

Asaf Cohen, Dennis Goeckel, Omer Gurewitz et al.

Consider a communication network with a source, a relay and a destination. Each time interval, the source may dynamically choose between a few possible coding schemes, based on the channel state, traffic pattern and its own queue status. For example, the source may choose between a direct route to the destination and a relay-assisted scheme. Clearly, due to the difference in the performance achieved, as well as the resources each scheme uses, a sender might wish to choose the most appropriate one based on its status. In this work, we formulate the problem as a Semi-Markov Decision Process. This formulation allows us to find an optimal policy, expressed as a function of the number of packets in the source queue and other parameters. In particular, we show a general solution which covers various configurations, including different packet size distributions and varying channels. Furthermore, for the case of exponential transmission times, we analytically prove the optimal policy has a threshold structure, that is, there is a unique value of a single parameter which determines which scheme (or route) is optimal. Results are also validated with simulations for several interesting models.

QMOct 21, 2017
Insulin Regimen ML-based control for T2DM patients

Mark Shifrin, Hava Siegelmann

\begin{abstract} We model individual T2DM patient blood glucose level (BGL) by stochastic process with discrete number of states mainly but not solely governed by medication regimen (e.g. insulin injections). BGL states change otherwise according to various physiological triggers which render a stochastic, statistically unknown, yet assumed to be quasi-stationary, nature of the process. In order to express incentive for being in desired healthy BGL we heuristically define a reward function which returns positive values for desirable BG levels and negative values for undesirable BG levels. The state space consists of sufficient number of states in order to allow for memoryless assumption. This, in turn, allows to formulate Markov Decision Process (MDP), with an objective to maximize the total reward, summarized over a long run. The probability law is found by model-based reinforcement learning (RL) and the optimal insulin treatment policy is retrieved from MDP solution.