Miller-Index-Based Latent Crystallographic Fracture Plane Reasoning with Vision-Language Models

arXiv:2605.2041678.6

Predicted impact top 16% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work explores a novel approach for integrating structured physical priors into vision-language models, but is incremental as it focuses on a specific domain and idealized conditions.

The paper investigates whether multimodal large language models can use Miller indices as a latent representation for reasoning about fracture geometry, finding that they can reliably infer these indices in idealized settings and reject them when physically invalid.

We study whether multimodal large language models (MLLMs) can leverage crystallographic plane indices (Miller indices) as a structured latent representation for reasoning about fracture geometry. We formulate Miller indices $z = (h,k,l)$ as a latent variable governing idealized planar fracture and evaluate two complementary capabilities: (i) latent inference, where the model maps visual observations to plane hypotheses under physically valid conditions, and (ii) latent applicability assessment, where the model determines whether such a representation is meaningful for a given fracture image. Through extensive experiments spanning synthetic data, controlled 2D--3D geometric pairs, and real-world fracture images across multiple material classes -- including ceramics, glass, metals, and concrete -- we show that MLLMs can reliably perform latent inference in idealized settings and, critically, can reject the latent representation when the underlying physics does not support it. These results suggest that MLLMs can act as physics-aware reasoning systems conditioned on structured latent priors, provided that the domain of validity is explicitly modeled.

View on arXiv PDF

Similar