iMoT: Inertial Motion Transformer for Inertial Navigation
This work addresses inertial navigation for applications like robotics or wearables, presenting an incremental improvement with novel components to existing Transformer-based approaches.
The paper tackles the problem of accurate positional estimation in inertial navigation by proposing iMoT, a Transformer-based method that retrieves cross-modal information from motion and rotation modalities, resulting in significantly outperforming state-of-the-art methods in robustness and accuracy for trajectory reconstruction.
We propose iMoT, an innovative Transformer-based inertial odometry method that retrieves cross-modal information from motion and rotation modalities for accurate positional estimation. Unlike prior work, during the encoding of the motion context, we introduce Progressive Series Decoupler at the beginning of each encoder layer to stand out critical motion events inherent in acceleration and angular velocity signals. To better aggregate cross-modal interactions, we present Adaptive Positional Encoding, which dynamically modifies positional embeddings for temporal discrepancies between different modalities. During decoding, we introduce a small set of learnable query motion particles as priors to model motion uncertainties within velocity segments. Each query motion particle is intended to draw cross-modal features dedicated to a specific motion mode, all taken together allowing the model to refine its understanding of motion dynamics effectively. Lastly, we design a dynamic scoring mechanism to stabilize iMoT's optimization by considering all aligned motion particles at the final decoding step, ensuring robust and accurate velocity segment estimation. Extensive evaluations on various inertial datasets demonstrate that iMoT significantly outperforms state-of-the-art methods in delivering superior robustness and accuracy in trajectory reconstruction.