Ruoqi Wang

6.5CVNov 29, 2024Code

GalaxAlign: Mimicking Citizen Scientists' Multimodal Guidance for Galaxy Morphology Analysis

Ruoqi Wang, Haitao Wang, Qiong Luo

Galaxy morphology analysis involves studying galaxies based on their shapes and structures. For such studies, fundamental tasks include identifying and classifying galaxies in astronomical images, as well as retrieving visually or structurally similar galaxies through similarity search. Existing methods either directly train domain-specific foundation models on large, annotated datasets or fine-tune vision foundation models on a smaller set of images. The former is effective but costly, while the latter is more resource-efficient but often yields lower accuracy. To address these challenges, we introduce GalaxAlign, a multimodal approach inspired by how citizen scientists identify galaxies in astronomical images by following textual descriptions and matching schematic symbols. Specifically, GalaxAlign employs a tri-modal alignment framework to align three types of data during fine-tuning: (1) schematic symbols representing galaxy shapes and structures, (2) textual labels for these symbols, and (3) galaxy images. By incorporating multimodal instructions, GalaxAlign eliminates the need for expensive pretraining and enhances the effectiveness of fine-tuning. Experiments on galaxy classification and similarity search demonstrate that our method effectively fine-tunes general pre-trained models for astronomical tasks by incorporating domain-specific multi-modal knowledge. Code is available at https://github.com/RapidsAtHKUST/GalaxAlign.

6.5CVAug 28, 2021

AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data

Ruoqi Wang, Ziwang Huang, Haitao Wang et al.

The use of multi-modal data such as the combination of whole slide images (WSIs) and gene expression data for survival analysis can lead to more accurate survival predictions. Previous multi-modal survival models are not able to efficiently excavate the intrinsic information within each modality. Moreover, previous methods regard the information from different modalities as similarly important so they cannot flexibly utilize the potential connection between the modalities. To address the above problems, we propose a new asymmetrical multi-modal method, termed as AMMASurv. Different from previous works, AMMASurv can effectively utilize the intrinsic information within every modality and flexibly adapts to the modalities of different importance. Encouraging experimental results demonstrate the superiority of our method over other state-of-the-art methods.

Ruoqi Wang

2 Papers