Multidimensional Task Learning: A Unified Tensor Framework for Computer Vision Tasks
This work provides a foundational mathematical framework for understanding and designing computer vision tasks, potentially benefiting researchers and practitioners by offering a more expressive and unified approach.
This paper proposes Multidimensional Task Learning (MTL), a unified tensor-based framework using Generalized Einstein MLPs (GE-MLPs) that directly process tensors via the Einstein product. It demonstrates that common computer vision tasks like classification, segmentation, and detection are specific dimensional configurations within MTL, and this framework can express a strictly larger set of tasks than traditional matrix-based methods.
This paper introduces Multidimensional Task Learning (MTL), a unified mathematical framework based on Generalized Einstein MLPs (GE-MLPs) that operate directly on tensors via the Einstein product. We argue that current computer vision task formulations are inherently constrained by matrix-based thinking: standard architectures rely on matrix-valued weights and vectorvalued biases, requiring structural flattening that restricts the space of naturally expressible tasks. GE-MLPs lift this constraint by operating with tensor-valued parameters, enabling explicit control over which dimensions are preserved or contracted without information loss. Through rigorous mathematical derivations, we demonstrate that classification, segmentation, and detection are special cases of MTL, differing only in their dimensional configuration within a formally defined task space. We further prove that this task space is strictly larger than what matrix-based formulations can natively express, enabling principled task configurations such as spatiotemporal or cross modal predictions that require destructive flattening under conventional approaches. This work provides a mathematical foundation for understanding, comparing, and designing computer vision tasks through the lens of tensor algebra.