Adaptive Geodesic Conformal Prediction for Egocentric Camera Pose Estimation
For AR and assistive devices requiring reliable uncertainty estimates, this work addresses the conditional coverage gap in conformal prediction for egocentric pose estimation.
Standard conformal prediction for egocentric camera pose estimation achieves 90% overall coverage but only ~60% on the hardest 25% of frames. The proposed DINOv2-Bridge adaptive CP closes this gap, improving hard-frame coverage to ~93% while maintaining overall coverage.
Egocentric pose estimation for Augmented Reality (AR) and assistive devices requires not just accurate predictions but guaranteed uncertainty regions. Conformal prediction (CP) provides such guarantees without retraining, but we show that standard CP with a single fixed threshold achieves nominal 90% overall coverage while covering only ~60% of the hardest 25% of frames (Q4) -- a ~30 percentage-point conditional coverage gap consistent across 12 participants, 3 predictors, and 3 horizons (108 evaluations) on EPIC-Fields. We further show that a geodesic SE(3) nonconformity score identifies physically harder frames than Euclidean scoring, with only 15-26% Q4 overlap and 2-3x higher ground-truth camera displacement for geodesic Q4 frames. To close the coverage gap, we propose DINOv2-Bridge adaptive CP: a two-stage difficulty estimator trained on a single source participant that transfers cross-participant without any images at test time, improving Q4 coverage from ~0.75 to ~0.93 while maintaining overall coverage at the 90% target.