DISeR: Designing Imaging Systems with Reinforcement Learning
This work addresses the problem of sub-optimal task performance in imaging systems for applications like autonomous vehicles, representing an incremental advancement by automating design through joint optimization.
The paper tackles the challenge of designing imaging systems by jointly optimizing cameras and perception models, using a context-free grammar and reinforcement learning to search over combinatorial configurations, and demonstrates improved performance on depth estimation and autonomous vehicle camera rig design tasks, outperforming industry standards.
Imaging systems consist of cameras to encode visual information about the world and perception models to interpret this encoding. Cameras contain (1) illumination sources, (2) optical elements, and (3) sensors, while perception models use (4) algorithms. Directly searching over all combinations of these four building blocks to design an imaging system is challenging due to the size of the search space. Moreover, cameras and perception models are often designed independently, leading to sub-optimal task performance. In this paper, we formulate these four building blocks of imaging systems as a context-free grammar (CFG), which can be automatically searched over with a learned camera designer to jointly optimize the imaging system with task-specific perception models. By transforming the CFG to a state-action space, we then show how the camera designer can be implemented with reinforcement learning to intelligently search over the combinatorial space of possible imaging system configurations. We demonstrate our approach on two tasks, depth estimation and camera rig design for autonomous vehicles, showing that our method yields rigs that outperform industry-wide standards. We believe that our proposed approach is an important step towards automating imaging system design.