Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
For robotic odor source localization in turbulent environments, this work provides an interpretable, memory-efficient navigation strategy but is limited by lack of adaptability to intermittency.
Tabular Q-learning with a minimal memory (clock since last whiff) learns an interpretable plume-recovery strategy combining surging, casting, and downwind return, achieving good performance on turbulent flow data. However, the agent's inability to adapt to local intermittency limits robustness, which can be improved by adding flexibility.
Finding an odor source in a turbulent flow requires effectively leveraging the history of olfactory observations into a robust navigation strategy. In this work, we use tabular Q-learning to train an olfactory search agent with a minimal memory of past observations: only a running clock since the last whiff. This agent learns an interpretable strategy to recover the plume which combines well-known behaviors observed in insects: surging, casting, and a return downwind. While achieving good performance on data from direct numerical simulations of turbulence, the agent is limited by an inability to adapt its strategy to the local intermittency level; we show that providing more flexibility improves robustness.