Fine-Tuning Vision-Language Models for Multimodal Polymer Property Prediction
This work addresses polymer property prediction for materials science, offering a more efficient multimodal approach that is incremental over existing VLM applications.
The researchers tackled the problem of predicting polymer properties using multimodal data by fine-tuning Vision-Language Models (VLMs) with a new dataset and instruction-tuning pairs, resulting in models that outperform unimodal and baseline approaches and reduce the need for separate models for different properties.
Vision-Language Models (VLMs) have shown strong performance in tasks like visual question answering and multimodal text generation, but their effectiveness in scientific domains such as materials science remains limited. While some machine learning methods have addressed specific challenges in this field, there is still a lack of foundation models designed for broad tasks like polymer property prediction using multimodal data. In this work, we present a multimodal polymer dataset to fine-tune VLMs through instruction-tuning pairs and assess the impact of multimodality on prediction performance. Our fine-tuned models, using LoRA, outperform unimodal and baseline approaches, demonstrating the benefits of multimodal learning. Additionally, this approach reduces the need to train separate models for different properties, lowering deployment and maintenance costs.