Min Xia

h-index36

4papers

425citations

Novelty53%

AI Score35

Ranked #105,054 of 194,257 authors (top 54%)#35,186 in CV (top 59%)

4 Papers

16.4CVNov 21, 2024Code

LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval

Weiheng Lu, Jian Li, An Yu et al.

Multimodal Large Language Models (MLLMs) are widely used for visual perception, understanding, and reasoning. However, long video processing and precise moment retrieval remain challenging due to LLMs' limited context size and coarse frame extraction. We propose the Large Language-and-Vision Assistant for Moment Retrieval (LLaVA-MR), which enables accurate moment retrieval and contextual grounding in videos using MLLMs. LLaVA-MR combines Dense Frame and Time Encoding (DFTE) for spatial-temporal feature extraction, Informative Frame Selection (IFS) for capturing brief visual and motion patterns, and Dynamic Token Compression (DTC) to manage LLM context limitations. Evaluations on benchmarks like Charades-STA and QVHighlights demonstrate that LLaVA-MR outperforms 11 state-of-the-art methods, achieving an improvement of 1.82% in R1@0.5 and 1.29% in mAP@0.5 on the QVHighlights dataset. Our implementation will be open-sourced upon acceptance.

1.2SYMay 20, 2024

Spatio-temporal Attention-based Hidden Physics-informed Neural Network for Remaining Useful Life Prediction

Feilong Jiang, Xiaonan Hou, Min Xia

Predicting the Remaining Useful Life (RUL) is essential in Prognostic Health Management (PHM) for industrial systems. Although deep learning approaches have achieved considerable success in predicting RUL, challenges such as low prediction accuracy and interpretability pose significant challenges, hindering their practical implementation. In this work, we introduce a Spatio-temporal Attention-based Hidden Physics-informed Neural Network (STA-HPINN) for RUL prediction, which can utilize the associated physics of the system degradation. The spatio-temporal attention mechanism can extract important features from the input data. With the self-attention mechanism on both the sensor dimension and time step dimension, the proposed model can effectively extract degradation information. The hidden physics-informed neural network is utilized to capture the physics mechanisms that govern the evolution of RUL. With the constraint of physics, the model can achieve higher accuracy and reasonable predictions. The approach is validated on a benchmark dataset, demonstrating exceptional performance when compared to cutting-edge methods, especially in the case of complex conditions.

7.9LGJun 6, 2024

Element-wise Multiplication Based Deeper Physics-Informed Neural Networks

Feilong Jiang, Xiaonan Hou, Min Xia

As a promising framework for resolving partial differential equations (PDEs), Physics-Informed Neural Networks (PINNs) have received widespread attention from industrial and scientific fields. However, lack of expressive ability and initialization pathology issues are found to prevent the application of PINNs in complex PDEs. In this work, we propose Deeper Physics-Informed Neural Network (Deeper-PINN) to resolve these issues. The element-wise multiplication operation is adopted to transform features into high-dimensional, non-linear spaces. Benefiting from element-wise multiplication operation, Deeper-PINNs can alleviate the initialization pathologies of PINNs and enhance the expressive capability of PINNs. The proposed structure is verified on various benchmarks. The results show that Deeper-PINNs can effectively resolve the initialization pathology and exhibit strong expressive ability.

29.9CVApr 22, 2019Code

STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing

Ming Liu, Yukang Ding, Min Xia et al.

Arbitrary attribute editing generally can be tackled by incorporating encoder-decoder and generative adversarial networks. However, the bottleneck layer in encoder-decoder usually gives rise to blurry and low quality editing result. And adding skip connections improves image quality at the cost of weakened attribute manipulation ability. Moreover, existing methods exploit target attribute vector to guide the flexible translation to desired target domain. In this work, we suggest to address these issues from selective transfer perspective. Considering that specific editing task is certainly only related to the changed attributes instead of all target attributes, our model selectively takes the difference between target and source attribute vectors as input. Furthermore, selective transfer units are incorporated with encoder-decoder to adaptively select and modify encoder feature for enhanced attribute editing. Experiments show that our method (i.e., STGAN) simultaneously improves attribute manipulation accuracy as well as perception quality, and performs favorably against state-of-the-arts in arbitrary facial attribute editing and season translation.