50.2ROMar 10
OA-NBV: Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile RobotsBoxun Hu, Chang Chang, Jiawei Ge et al.
We naturally step sideways or lean to see around the obstacle when our view is blocked, and recover a more informative observation. Enabling robots to make the same kind of viewpoint choice is critical for human-centered operations, including search, triage, and disaster response, where cluttered environments and partial visibility frequently degrade downstream perception. However, many Next-Best-View (NBV) methods primarily optimize generic exploration or long-horizon coverage, and do not explicitly target the immediate goal of obtaining a single usable observation of a partially occluded person under real motion constraints. We present Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots (OA-NBV), an occlusion-aware NBV pipeline that autonomously selects the next traversable viewpoint to obtain a more complete view of an occluded human. OA-NBV integrates perception and motion planning by scoring candidate viewpoints using a target-centric visibility model that accounts for occlusion, target scale, and target completeness, while restricting candidates to feasible robot poses. OA-NBV achieves over 90% success rate in both simulation and real-world trials, while baseline NBV methods degrade sharply under occlusion. Beyond success rate, OA-NBV improves observation quality: compared to the strongest baseline, it increases normalized target area by at least 81% and keypoint visibility by at least 58% across settings, making it a drop-in view-selection module for diverse human-centered downstream tasks.
QMSep 17, 2024
Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learningQingqing Wang, Chang Chang
Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N) cancers is a time-consuming and experience-demanding task where a large number of planning objectives are involved. Deep reinforcement learning (DRL) has recently been introduced to the planning processes of intensity-modulated radiation therapy and brachytherapy for prostate, lung, and cervical cancers. However, existing approaches are built upon the Q-learning framework and weighted linear combinations of clinical metrics, suffering from poor scalability and flexibility and only capable of adjusting a limited number of planning objectives in discrete action spaces. We propose an automatic treatment planning model using the proximal policy optimization (PPO) algorithm and a dose distribution-based reward function for proton PBS treatment planning of H&N cancers. Specifically, a set of empirical rules is used to create auxiliary planning structures from target volumes and organs-at-risk (OARs), along with their associated planning objectives. These planning objectives are fed into an in-house optimization engine to generate the spot monitor unit (MU) values. A decision-making policy network trained using PPO is developed to iteratively adjust the involved planning objective parameters in a continuous action space and refine the PBS treatment plans using a novel dose distribution-based reward function. Proton H&N treatment plans generated by the model show improved OAR sparing with equal or superior target coverage when compared with human-generated plans. Moreover, additional experiments on liver cancer demonstrate that the proposed method can be successfully generalized to other treatment sites. To the best of our knowledge, this is the first DRL-based automatic treatment planning model capable of achieving human-level performance for H&N cancers.
AIAug 14, 2025
A learning-driven automatic planning framework for proton PBS treatments of H&N cancersQingqing Wang, Liqiang Xiao, Chang Chang
Proton pencil beam scanning (PBS) treatment planning for head & neck (H&N) cancers involves numerous conflicting objectives, requiring iterative objective parameter adjustments to balance multiple clinical goals. We propose a learning-driven inverse optimizer and integrate it into a proximal policy optimization (PPO)-based planning framework to automatically generate high-quality plans for patients with diverse treatment requirements. The inverse optimizer is a learning-to-optimize (L2O) method that predicts update steps by learning from task-specific data distributions. For the first time, long-context processing techniques developed for large language models (LLMs) are utilized to address the scalability limitations of existing L2O methods, enabling simultaneous optimization over a substantially large set of variables. The PPO framework functions as an outer-loop virtual planner, autonomously adjusting objective parameters through a policy network, and the inner-loop L2O inverse optimizer computes machine-deliverable spot monitor unit (MU) values based on the PPO-refined objectives. Moreover, a Swin UnetR dose predictor is trained with prescription- and beam-specific information to estimate the initial objective parameters. In our experiments, total 97 patients with bilateral or ipsilateral H&N cancers are collected for training and testing. Compared with the second-order gradient-based methods, our L2O optimizer improves the effectiveness and efficiency of the time-consuming inverse optimization by 22.97% and 36.41%, respectively, and in conjunction with the PPO-based virtual planner, plans are generated within clinically acceptable times, i.e. 2.55 hours in average, and shows improved or comparable organs-at-risk sparing with superior target coverage compared with human-generated plans.