CVAIMar 4, 2025

Developing a PET/CT Foundation Model for Cross-Modal Anatomical and Functional Imaging

arXiv:2503.02824v15 citationsh-index: 32
Originality Highly original
AI Analysis

This work addresses the need for more robust and generalizable AI tools in oncology for cancer diagnosis and monitoring, representing a novel method for a known bottleneck rather than an incremental improvement.

The authors tackled the problem of limited generalizability and robustness in AI-driven PET/CT analysis by proposing a foundation model called FratMAE, which integrates anatomical and functional imaging data through cross-attention mechanisms and achieves superior performance on downstream tasks.

In oncology, Positron Emission Tomography-Computed Tomography (PET/CT) is widely used in cancer diagnosis, staging, and treatment monitoring, as it combines anatomical details from CT with functional metabolic activity and molecular marker expression information from PET. However, existing artificial intelligence-driven PET/CT analyses rely predominantly on task-specific models trained from scratch or on limited datasets, limiting their generalizability and robustness. To address this, we propose a foundation model approach specifically designed for multimodal PET/CT imaging. We introduce the Cross-Fraternal Twin Masked Autoencoder (FratMAE), a novel framework that effectively integrates whole-body anatomical and functional or molecular information. FratMAE employs separate Vision Transformer (ViT) encoders for PET and CT scans, along with cross-attention decoders that enable synergistic interactions between modalities during masked autoencoder training. Additionally, it incorporates textual metadata to enhance PET representation learning. By pre-training on PET/CT datasets, FratMAE captures intricate cross-modal relationships and global uptake patterns, achieving superior performance on downstream tasks and demonstrating its potential as a generalizable foundation model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes