Peiyuan Chen

h-index3

4papers

33citations

Novelty43%

AI Score25

Ranked #171,016 of 201,326 authors (top 85%)#52,143 in CV (top 89%)

4 Papers

CVAug 14, 2024

Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion

Peiyuan Chen, Zecheng Zhang, Yiping Dong et al.

Visual Question Answering (VQA) is a challenging task that requires systems to provide accurate answers to questions based on image content. Current VQA models struggle with complex questions due to limitations in capturing and integrating multimodal information effectively. To address these challenges, we propose the Rank VQA model, which leverages a ranking-inspired hybrid training strategy to enhance VQA performance. The Rank VQA model integrates high-quality visual features extracted using the Faster R-CNN model and rich semantic text features obtained from a pre-trained BERT model. These features are fused through a sophisticated multimodal fusion technique employing multi-head self-attention mechanisms. Additionally, a ranking learning module is incorporated to optimize the relative ranking of answers, thus improving answer accuracy. The hybrid training strategy combines classification and ranking losses, enhancing the model's generalization ability and robustness across diverse datasets. Experimental results demonstrate the effectiveness of the Rank VQA model. Our model significantly outperforms existing state-of-the-art models on standard VQA datasets, including VQA v2.0 and COCO-QA, in terms of both accuracy and Mean Reciprocal Rank (MRR). The superior performance of Rank VQA is evident in its ability to handle complex questions that require understanding nuanced details and making sophisticated inferences from the image and text. This work highlights the effectiveness of a ranking-based hybrid training strategy in improving VQA performance and lays the groundwork for further research in multimodal learning methods.

LGDec 5, 2024

Electronic Health Records-Based Data-Driven Diabetes Knowledge Unveiling and Risk Prognosis

Huadong Pang, Li Zhou, Yiping Dong et al.

In the healthcare sector, the application of deep learning technologies has revolutionized data analysis and disease forecasting. This is particularly evident in the field of diabetes, where the deep analysis of Electronic Health Records (EHR) has unlocked new opportunities for early detection and effective intervention strategies. Our research presents an innovative model that synergizes the capabilities of Bidirectional Long Short-Term Memory Networks-Conditional Random Field (BiLSTM-CRF) with a fusion of XGBoost and Logistic Regression. This model is designed to enhance the accuracy of diabetes risk prediction by conducting an in-depth analysis of electronic medical records data. The first phase of our approach involves employing BiLSTM-CRF to delve into the temporal characteristics and latent patterns present in EHR data. This method effectively uncovers the progression trends of diabetes, which are often hidden in the complex data structures of medical records. The second phase leverages the combined strength of XGBoost and Logistic Regression to classify these extracted features and evaluate associated risks. This dual approach facilitates a more nuanced and precise prediction of diabetes, outperforming traditional models, particularly in handling multifaceted and nonlinear medical datasets. Our research demonstrates a notable advancement in diabetes prediction over traditional methods, showcasing the effectiveness of our combined BiLSTM-CRF, XGBoost, and Logistic Regression model. This study highlights the value of data-driven strategies in clinical decision-making, equipping healthcare professionals with precise tools for early detection and intervention. By enabling personalized treatment and timely care, our approach signifies progress in incorporating advanced analytics in healthcare, potentially improving outcomes for diabetes and other chronic conditions.

CVDec 3, 2024

Optimized CNNs for Rapid 3D Point Cloud Object Recognition

Tianyi Lyu, Dian Gu, Peiyuan Chen et al.

This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an $\mathcal{L}_1$ penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with $\mathcal{L}_1$ regularization to effectively handle large-scale 3D data processing. Our method's efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods. Additionally, they maintain competitive processing speeds. This underscores our approach's capability to substantially enhance detection performance while ensuring computational efficiency suitable for real-time applications.

AIDec 3, 2024

Construction and optimization of health behavior prediction model for the elderly in smart elderly care

Qian Guo, Peiyuan Chen

With the intensification of global aging, health management of the elderly has become a focus of social attention. This study designs and implements a smart elderly care service model to address issues such as data diversity, health status complexity, long-term dependence and data loss, sudden changes in behavior, and data privacy in the prediction of health behaviors of the elderly. The model achieves accurate prediction and dynamic management of health behaviors of the elderly through modules such as multimodal data fusion, data loss processing, nonlinear prediction, emergency detection, and privacy protection. In the experimental design, based on multi-source data sets and market research results, the model demonstrates excellent performance in health behavior prediction, emergency detection, and personalized services. The experimental results show that the model can effectively improve the accuracy and robustness of health behavior prediction and meet the actual application needs in the field of smart elderly care. In the future, with the integration of more data and further optimization of technology, the model will provide more powerful technical support for smart elderly care services.