1.8COMay 5
Conflict-Aware Seat Assignment in Classroom EnvironmentsBruna Cristina Braga Charytitsch, Mariá Cristina Vasconcelos Nascimento
Classroom dynamics depend on various elements that influence teaching performance and learning activities. A key challenge is to determine the most effective seating plan, where students will seat in a specific classroom setting to achieve the best learning environment. This paper introduces the Student Seat Allocation Problem (SSAP) for strategically organizing student seating in traditional classrooms to minimize interpersonal conflicts. We propose a mathematical model and an Iterated Local Search (ILS) heuristic to solve the SSAP. Computational experiments demonstrated that ILS outperformed in more complex scenarios when compared to the results obtained by a commercial solver on the introduced mathematical model. ILS was particularly efficient in real and artificial instances that exhibited a higher number of conflicts.
LGApr 7, 2025
A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic NetworksLeonardo Kanashiro Felizardo, Edoardo Fadda, Paolo Brandimarte et al.
This paper presents Post-Decision Proximal Policy Optimization (PDPPO), a novel variation of the leading deep reinforcement learning method, Proximal Policy Optimization (PPO). The PDPPO state transition process is divided into two steps: a deterministic step resulting in the post-decision state and a stochastic step leading to the next state. Our approach incorporates post-decision states and dual critics to reduce the problem's dimensionality and enhance the accuracy of value function estimation. Lot-sizing is a mixed integer programming problem for which we exemplify such dynamics. The objective of lot-sizing is to optimize production, delivery fulfillment, and inventory levels in uncertain demand and cost parameters. This paper evaluates the performance of PDPPO across various environments and configurations. Notably, PDPPO with a dual critic architecture achieves nearly double the maximum reward of vanilla PPO in specific scenarios, requiring fewer episode iterations and demonstrating faster and more consistent learning across different initializations. On average, PDPPO outperforms PPO in environments with a stochastic component in the state transition. These results support the benefits of using a post-decision state. Integrating this post-decision state in the value function approximation leads to more informed and efficient learning in high-dimensional and stochastic environments.