Yuting Shi

CL
h-index3
3papers
157citations
Novelty60%
AI Score32

3 Papers

CLNov 1, 2024
Phase Diagram of Vision Large Language Models Inference: A Perspective from Interaction across Image and Instruction

Houjing Wei, Yuting Shi, Naoya Inoue

Vision Large Language Models (VLLMs) usually take input as a concatenation of image token embeddings and text token embeddings and conduct causal modeling. However, their internal behaviors remain underexplored, raising the question of interaction among two types of tokens. To investigate such multimodal interaction during model inference, in this paper, we measure the contextualization among the hidden state vectors of tokens from different modalities. Our experiments uncover a four-phase inference dynamics of VLLMs against the depth of Transformer-based LMs, including (I) Alignment: In very early layers, contextualization emerges between modalities, suggesting a feature space alignment. (II) Intra-modal Encoding: In early layers, intra-modal contextualization is enhanced while inter-modal interaction is suppressed, suggesting a local encoding within modalities. (III) Inter-modal Encoding: In later layers, contextualization across modalities is enhanced, suggesting a deeper fusion across modalities. (IV) Output Preparation: In very late layers, contextualization is reduced globally, and hidden states are aligned towards the unembedding space.

CVJan 21, 2021
MoDL-QSM: Model-based Deep Learning for Quantitative Susceptibility Mapping

Ruimin Feng, Jiayi Zhao, He Wang et al.

Quantitative susceptibility mapping (QSM) has demonstrated great potential in quantifying tissue susceptibility in various brain diseases. However, the intrinsic ill-posed inverse problem relating the tissue phase to the underlying susceptibility distribution affects the accuracy for quantifying tissue susceptibility. Recently, deep learning has shown promising results to improve accuracy by reducing the streaking artifacts. However, there exists a mismatch between the observed phase and the theoretical forward phase estimated by the susceptibility label. In this study, we proposed a model-based deep learning architecture that followed the STI (susceptibility tensor imaging) physical model, referred to as MoDL-QSM. Specifically, MoDL-QSM accounts for the relationship between STI-derived phase contrast induced by the susceptibility tensor terms (ki13,ki23,ki33) and the acquired single-orientation phase. The convolution neural networks are embedded into the physical model to learn a regularization term containing prior information. ki33 and phase induced by ki13 and ki23 terms were used as the labels for network training. Quantitative evaluation metrics (RSME, SSIM, and HFEN) were compared with recently developed deep learning QSM methods. The results showed that MoDL-QSM achieved superior performance, demonstrating its potential for future applications.

COMP-PHAug 14, 2020
Orbital Graph Convolutional Neural Network for Material Property Prediction

Mohammadreza Karamad, Rishikesh Magar, Yuting Shi et al.

Material representations that are compatible with machine learning models play a key role in developing models that exhibit high accuracy for property prediction. Atomic orbital interactions are one of the important factors that govern the properties of crystalline materials, from which the local chemical environments of atoms is inferred. Therefore, to develop robust machine learningmodels for material properties prediction, it is imperative to include features representing such chemical attributes. Here, we propose the Orbital Graph Convolutional Neural Network (OGCNN), a crystal graph convolutional neural network framework that includes atomic orbital interaction features that learns material properties in a robust way. In addition, we embedded an encoder-decoder network into the OGCNN enabling it to learn important features among basic atomic (elemental features), orbital-orbital interactions, and topological features. We examined the performance of this model on a broad range of crystalline material data to predict different properties. We benchmarked the performance of the OGCNN model with that of: 1) the crystal graph convolutional neural network (CGCNN), 2) other state-of-the-art descriptors for material representations including Many-body Tensor Representation (MBTR) and the Smooth Overlap of Atomic Positions (SOAP), and 3) other conventional regression machine learning algorithms where different crystal featurization methods have been used. We find that OGCNN significantly outperforms them. The OGCNN model with high predictive accuracy can be used to discover new materials among the immense phase and compound spaces of materials