A Flexible-Frame-Rate Vision-Aided Inertial Object Tracking System for Mobile Devices
This addresses the problem of enabling smooth, real-time AR on mobile devices, which is incremental as it builds on existing visual-inertial methods with architectural improvements.
The paper tackles real-time object pose estimation and tracking for mobile AR by proposing a client-server system using inertial sensors for high-speed tracking and server-side image processing for accuracy, achieving up to 120 FPS with high precision on low-end devices.
Real-time object pose estimation and tracking is challenging but essential for emerging augmented reality (AR) applications. In general, state-of-the-art methods address this problem using deep neural networks which indeed yield satisfactory results. Nevertheless, the high computational cost of these methods makes them unsuitable for mobile devices where real-world applications usually take place. In addition, head-mounted displays such as AR glasses require at least 90~FPS to avoid motion sickness, which further complicates the problem. We propose a flexible-frame-rate object pose estimation and tracking system for mobile devices. It is a monocular visual-inertial-based system with a client-server architecture. Inertial measurement unit (IMU) pose propagation is performed on the client side for high speed tracking, and RGB image-based 3D pose estimation is performed on the server side to obtain accurate poses, after which the pose is sent to the client side for visual-inertial fusion, where we propose a bias self-correction mechanism to reduce drift. We also propose a pose inspection algorithm to detect tracking failures and incorrect pose estimation. Connected by high-speed networking, our system supports flexible frame rates up to 120 FPS and guarantees high precision and real-time tracking on low-end devices. Both simulations and real world experiments show that our method achieves accurate and robust object tracking.