Gaze-contingent decoding of human navigation intention on an autonomous wheelchair platform
This work provides a cognitive-level interface for users of autonomous wheelchairs, reducing the need for continuous manual steering, though it appears incremental as it builds on existing gaze-based control methods.
The paper tackled the problem of decoding human navigation intentions for autonomous wheelchairs by analyzing gaze patterns, specifically addressing the Midas Touch Problem where not all eye movements indicate intent. The result was a system that successfully distinguishes between mere looking and actual driving intentions, enabling the wheelchair to navigate to desired objects while avoiding obstacles.
We have pioneered the Where-You-Look-Is Where-You-Go approach to controlling mobility platforms by decoding how the user looks at the environment to understand where they want to navigate their mobility device. However, many natural eye-movements are not relevant for action intention decoding, only some are, which places a challenge on decoding, the so-called Midas Touch Problem. Here, we present a new solution, consisting of 1. deep computer vision to understand what object a user is looking at in their field of view, with 2. an analysis of where on the object's bounding box the user is looking, to 3. use a simple machine learning classifier to determine whether the overt visual attention on the object is predictive of a navigation intention to that object. Our decoding system ultimately determines whether the user wants to drive to e.g., a door or just looks at it. Crucially, we find that when users look at an object and imagine they were moving towards it, the resulting eye-movements from this motor imagery (akin to neural interfaces) remain decodable. Once a driving intention and thus also the location is detected our system instructs our autonomous wheelchair platform, the A.Eye-Drive, to navigate to the desired object while avoiding static and moving obstacles. Thus, for navigation purposes, we have realised a cognitive-level human interface, as it requires the user only to cognitively interact with the desired goal, not to continuously steer their wheelchair to the target (low-level human interfacing).