Application of Deep Q Learning with Simulation Results for Elevator Optimization
This is an incremental application of existing methods to a domain-specific problem in elevator control.
The paper tackled elevator wait time optimization by developing a Deep Q Learning model and comparing it to a naive approach using simulated user data, but it only attempted to match the naive model's performance without reporting concrete improvements.
This paper presents a methodology for combining programming and mathematics to optimize elevator wait times. Based on simulated user data generated according to the canonical three-peak model of elevator traffic, we first develop a naive model from an intuitive understanding of the logic behind elevators. We take into consideration a general array of features including capacity, acceleration, and maximum wait time thresholds to adequately model realistic circumstances. Using the same evaluation framework, we proceed to develop a Deep Q Learning model in an attempt to match the hard-coded naive approach for elevator control. Throughout the majority of the paper, we work under a Markov Decision Process (MDP) schema, but later explore how the assumption fails to characterize the highly stochastic overall Elevator Group Control System (EGCS).