COMP-PHAug 13, 2020Code
A community-powered search of machine learning strategy space to find NMR property prediction modelsLars A. Bratholm, Will Gerrard, Brandon Anderson et al.
The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published "in-house" efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.
53.7CVMay 9
Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of ReasoningNaeun Lee, Hyunjong Kim, Sunghwan Choi et al.
Despite strong performance of Multimodal Large Language Models (MLLMs) on multimodal tasks, predicting whether and why an image is persuasive remains challenging. We first show that prompting MLLMs to reason before prediction does not consistently help, and can even reduce persuasiveness prediction performance, suggesting that naively generated rationales are unreliable signals for this task. Yet, no established methodology exists for training MLLMs to reason about visual persuasion or evaluating whether their rationales faithfully support their decisions. To address this gap, we show empirically and theoretically that diverse teacher-generated rationales, when used for supervised fine-tuning, improve visual persuasiveness prediction. We further introduce a three-dimensional faithfulness evaluation framework covering rationale-to-decision consistency, rationale-to-image groundedness, and rationale-to-decision sensitivity. Applying this framework shows that prediction performance alone does not guarantee faithful rationales, while rationale-to-decision sensitivity is most aligned with human rationale preferences. These findings motivate faithfulness-aware training objectives and scalable rationale supervision for visual persuasiveness evaluation. Our code and dataset will be made publicly available.