CVAug 2, 2022

T4DT: Tensorizing Time for Learning Temporal 3D Visual Data

Mikhail Usvyatsov, Rafael Ballester-Rippoll, Lina Bashaeva, Konrad Schindler, Gonzalo Ferrer, Ivan Oseledets

arXiv:2208.01421v23.75 citationsh-index: 78Has Code

Originality Incremental advance

AI Analysis

This addresses the memory bottleneck for processing 4D scenes in computer vision, offering a non-iterative alternative to learning-based methods.

The paper tackles the memory inefficiency of grid representations for time-varying 3D visual data by proposing low-rank tensor compression, which significantly reduces storage while preserving geometric quality.

Unlike 2D raster images, there is no single dominant representation for 3D visual data processing. Different formats like point clouds, meshes, or implicit functions each have their strengths and weaknesses. Still, grid representations such as signed distance functions have attractive properties also in 3D. In particular, they offer constant-time random access and are eminently suitable for modern machine learning. Unfortunately, the storage size of a grid grows exponentially with its dimension. Hence they often exceed memory limits even at moderate resolution. This work proposes using low-rank tensor formats, including the Tucker, tensor train, and quantics tensor train decompositions, to compress time-varying 3D data. Our method iteratively computes, voxelizes, and compresses each frame's truncated signed distance function and applies tensor rank truncation to condense all frames into a single, compressed tensor that represents the entire 4D scene. We show that low-rank tensor compression is extremely compact to store and query time-varying signed distance functions. It significantly reduces the memory footprint of 4D scenes while remarkably preserving their geometric quality. Unlike existing, iterative learning-based approaches like DeepSDF and NeRF, our method uses a closed-form algorithm with theoretical guarantees.

View on arXiv PDF Code

Similar