CLNov 18, 2023
Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural NetworksPanfeng Li, Mohamed Abouelenien, Rada Mihalcea et al.
Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contributions. First, we extract linguistic and physiological features from this data to train and construct the neural network models. Second, we propose a fused convolutional neural network model using both modalities in order to achieve an improved overall performance. Third, we compare our new approach with earlier methods designed for multimodal deception detection. We find that our system outperforms regular classification methods; our results indicate the feasibility of using neural networks for deception detection even in the presence of limited amounts of data.
CPAug 13, 2024
Harnessing Earnings Reports for Stock Predictions: A QLoRA-Enhanced LLM ApproachHaowei Ni, Shuchen Meng, Xupeng Chen et al.
Accurate stock market predictions following earnings reports are crucial for investors. Traditional methods, particularly classical machine learning models, struggle with these predictions because they cannot effectively process and interpret extensive textual data contained in earnings reports and often overlook nuances that influence market movements. This paper introduces an advanced approach by employing Large Language Models (LLMs) instruction fine-tuned with a novel combination of instruction-based techniques and quantized low-rank adaptation (QLoRA) compression. Our methodology integrates 'base factors', such as financial metric growth and earnings transcripts, with 'external factors', including recent market indices performances and analyst grades, to create a rich, supervised dataset. This comprehensive dataset enables our models to achieve superior predictive performance in terms of accuracy, weighted F1, and Matthews correlation coefficient (MCC), especially evident in the comparison with benchmarks such as GPT-4. We specifically highlight the efficacy of the llama-3-8b-Instruct-4bit model, which showcases significant improvements over baseline models. The paper also discusses the potential of expanding the output capabilities to include a 'Hold' option and extending the prediction horizon, aiming to accommodate various investment styles and time frames. This study not only demonstrates the power of integrating cutting-edge AI with fine-tuned financial data but also paves the way for future research in enhancing AI-driven financial analysis tools.
CVAug 8, 2024
Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs Gaussian-Based MethodsYiming Zhou, Zixuan Zeng, Andi Chen et al.
Exploring the capabilities of Neural Radiance Fields (NeRF) and Gaussian-based methods in the context of 3D scene reconstruction, this study contrasts these modern approaches with traditional Simultaneous Localization and Mapping (SLAM) systems. Utilizing datasets such as Replica and ScanNet, we assess performance based on tracking accuracy, mapping fidelity, and view synthesis. Findings reveal that NeRF excels in view synthesis, offering unique capabilities in generating new perspectives from existing data, albeit at slower processing speeds. Conversely, Gaussian-based methods provide rapid processing and significant expressiveness but lack comprehensive scene completion. Enhanced by global optimization and loop closure techniques, newer methods like NICE-SLAM and SplaTAM not only surpass older frameworks such as ORB-SLAM2 in terms of robustness but also demonstrate superior performance in dynamic and complex environments. This comparative analysis bridges theoretical research with practical implications, shedding light on future developments in robust 3D scene reconstruction across various real-world applications.
CLJun 13, 2025Code
Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMsChenqian Le, Ziheng Gong, Chihang Wang et al.
Large language models (LLMs) have shown great potential in medical question answering (MedQA), yet adapting them to biomedical reasoning remains challenging due to domain-specific complexity and limited supervision. In this work, we study how prompt design and lightweight fine-tuning affect the performance of open-source LLMs on PubMedQA, a benchmark for multiple-choice biomedical questions. We focus on two widely used prompting strategies - standard instruction prompts and Chain-of-Thought (CoT) prompts - and apply QLoRA for parameter-efficient instruction tuning. Across multiple model families and sizes, our experiments show that CoT prompting alone can improve reasoning in zero-shot settings, while instruction tuning significantly boosts accuracy. However, fine-tuning on CoT prompts does not universally enhance performance and may even degrade it for certain larger models. These findings suggest that reasoning-aware prompts are useful, but their benefits are model- and scale-dependent. Our study offers practical insights into combining prompt engineering with efficient finetuning for medical QA applications.
CVFeb 12
Efficient Segment Anything with Depth-Aware Fusion and Limited Training DataYiming Zhou, Xuenjie Xie, Panfeng Li et al.
Segment Anything Models (SAM) achieve impressive universal segmentation performance but require massive datasets (e.g., 11M images) and rely solely on RGB inputs. Recent efficient variants reduce computation but still depend on large-scale training. We propose a lightweight RGB-D fusion framework that augments EfficientViT-SAM with monocular depth priors. Depth maps are generated with a pretrained estimator and fused mid-level with RGB features through a dedicated depth encoder. Trained on only 11.2k samples (less than 0.1\% of SA-1B), our method achieves higher accuracy than EfficientViT-SAM, showing that depth cues provide strong geometric priors for segmentation.
CVApr 21, 2024
Exploring Diverse Methods in Visual Question AnsweringPanfeng Li, Qikai Yang, Xieming Geng et al.
This study explores innovative methods for improving Visual Question Answering (VQA) using Generative Adversarial Networks (GANs), autoencoders, and attention mechanisms. Leveraging a balanced VQA dataset, we investigate three distinct strategies. Firstly, GAN-based approaches aim to generate answer embeddings conditioned on image and question inputs, showing potential but struggling with more complex tasks. Secondly, autoencoder-based techniques focus on learning optimal embeddings for questions and images, achieving comparable results with GAN due to better ability on complex questions. Lastly, attention mechanisms, incorporating Multimodal Compact Bilinear pooling (MCB), address language priors and attention modeling, albeit with a complexity-performance trade-off. This study underscores the challenges and opportunities in VQA and suggests avenues for future research, including alternative GAN formulations and attentional mechanisms.
CVApr 22, 2024
Regional Style and Color TransferZhicheng Ding, Panfeng Li, Qikai Yang et al.
This paper presents a novel contribution to the field of regional style transfer. Existing methods often suffer from the drawback of applying style homogeneously across the entire image, leading to stylistic inconsistencies or foreground object twisted when applied to image with foreground elements such as person figures. To address this limitation, we propose a new approach that leverages a segmentation network to precisely isolate foreground objects within the input image. Subsequently, style transfer is applied exclusively to the background region. The isolated foreground objects are then carefully reintegrated into the style-transferred background. To enhance the visual coherence between foreground and background, a color transfer step is employed on the foreground elements prior to their rein-corporation. Finally, we utilize feathering techniques to achieve a seamless amalgamation of foreground and background, resulting in a visually unified and aesthetically pleasing final composition. Extensive evaluations demonstrate that our proposed approach yields significantly more natural stylistic transformations compared to conventional methods.
SIApr 22, 2024
A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data AugmentationQikai Yang, Panfeng Li, Xinhe Xu et al.
In the ever-evolving landscape of social network advertising, the volume and accuracy of data play a critical role in the performance of predictive models. However, the development of robust predictive algorithms is often hampered by the limited size and potential bias present in real-world datasets. This study presents and explores a generative augmentation framework of social network advertising data. Our framework explores three generative models for data augmentation - Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Gaussian Mixture Models (GMMs) - to enrich data availability and diversity in the context of social network advertising analytics effectiveness. By performing synthetic extensions of the feature space, we find that through data augmentation, the performance of various classifiers has been quantitatively improved. Furthermore, we compare the relative performance gains brought by each data augmentation technique, providing insights for practitioners to select appropriate techniques to enhance model performance. This paper contributes to the literature by showing that synthetic data augmentation alleviates the limitations imposed by small or imbalanced datasets in the field of social network advertising. At the same time, this article also provides a comparative perspective on the practicality of different data augmentation methods, thereby guiding practitioners to choose appropriate techniques to enhance model performance.
38.7AIApr 30
Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain AdaptationXupeng Chen, Binbin Shi, Chenqian Le et al.
Deploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is poorly characterized. We audit five recent frontier and grounding-aware VLMs (Gemini~2.5~Pro, GPT-5, o3, GLM-4.5V, Qwen~2.5~VL) on Medical VQA along two trust-relevant axes. Perception: all models localize anatomical and pathological targets poorly -- the best model reaches only 0.23 mean IoU and 19.1% Acc@0.5 -- and exhibit clinically dangerous laterality confusion. Pipeline integration: a self-grounding pipeline, where the same model localizes then answers, degrades VQA accuracy for every model -- driven by both inaccurate localization and format-compliance failures under the two-step prompt (parse failure rises to 70%--99% for Gemini and GPT-5 on VQA-RAD). Replacing predicted boxes with ground-truth annotations recovers and improves VQA accuracy, consistent with the failure residing in the perception module rather than in the decomposition itself. These observational findings identify grounding quality as a primary trustworthiness bottleneck in our SLAKE bounding-box setting. As a complementary fine-tuning follow-up, supervised fine-tuning of Qwen~2.5~VL on combined Med-VQA training data attains the highest reported SLAKE open-ended recall (85.5%) among comparable methods, suggesting that the VQA-level gap is tractable with domain adaptation; whether this also closes the perception/trustworthiness bottleneck is left to future work.
CVApr 14, 2025
Data Augmentation Through Random Style ReplacementQikai Yang, Cheng Ji, Huaiying Luo et al.
In this paper, we introduce a novel data augmentation technique that combines the advantages of style augmentation and random erasing by selectively replacing image subregions with style-transferred patches. Our approach first applies a random style transfer to training images, then randomly substitutes selected areas of these images with patches derived from the style-transferred versions. This method is able to seamlessly accommodate a wide range of existing style transfer algorithms and can be readily integrated into diverse data augmentation pipelines. By incorporating our strategy, the training process becomes more robust and less prone to overfitting. Comparative experiments demonstrate that, relative to previous style augmentation methods, our technique achieves superior performance and faster convergence.
LGOct 25, 2024
Enhancing Exchange Rate Forecasting with Explainable Deep Learning ModelsShuchen Meng, Andi Chen, Chihang Wang et al.
Accurate exchange rate prediction is fundamental to financial stability and international trade, positioning it as a critical focus in economic and financial research. Traditional forecasting models often falter when addressing the inherent complexities and non-linearities of exchange rate data. This study explores the application of advanced deep learning models, including LSTM, CNN, and transformer-based architectures, to enhance the predictive accuracy of the RMB/USD exchange rate. Utilizing 40 features across 6 categories, the analysis identifies TSMixer as the most effective model for this task. A rigorous feature selection process emphasizes the inclusion of key economic indicators, such as China-U.S. trade volumes and exchange rates of other major currencies like the euro-RMB and yen-dollar pairs. The integration of grad-CAM visualization techniques further enhances model interpretability, allowing for clearer identification of the most influential features and bolstering the credibility of the predictions. These findings underscore the pivotal role of fundamental economic data in exchange rate forecasting and highlight the substantial potential of machine learning models to deliver more accurate and reliable predictions, thereby serving as a valuable tool for financial analysis and decision-making.
MED-PHJun 22, 2024
A Review of Electromagnetic Elimination Methods for low-field portable MRI scannerWanyu Bian, Panfeng Li, Mengyao Zheng et al.
This paper analyzes conventional and deep learning methods for eliminating electromagnetic interference (EMI) in MRI systems. We compare traditional analytical and adaptive techniques with advanced deep learning approaches. Key strengths and limitations of each method are highlighted. Recent advancements in active EMI elimination, such as external EMI receiver coils, are discussed alongside deep learning methods, which show superior EMI suppression by leveraging neural networks trained on MRI data. While deep learning improves EMI elimination and diagnostic capabilities, it introduces security and safety concerns, particularly in commercial applications. A balanced approach, integrating conventional reliability with deep learning's advanced capabilities, is proposed for more effective EMI suppression in MRI systems.
LGJun 18, 2024
Time Series Modeling for Heart Rate Prediction: From ARIMA to TransformersHaowei Ni, Shuchen Meng, Xieming Geng et al.
Cardiovascular disease (CVD) is a leading cause of death globally, necessitating precise forecasting models for monitoring vital signs like heart rate, blood pressure, and ECG. Traditional models, such as ARIMA and Prophet, are limited by their need for manual parameter tuning and challenges in handling noisy, sparse, and highly variable medical data. This study investigates advanced deep learning models, including LSTM, and transformer-based architectures, for predicting heart rate time series from the MIT-BIH Database. Results demonstrate that deep learning models, particularly PatchTST, significantly outperform traditional models across multiple metrics, capturing complex patterns and dependencies more effectively. This research underscores the potential of deep learning to enhance patient monitoring and CVD management, suggesting substantial clinical benefits. Future work should extend these findings to larger, more diverse datasets and real-world clinical applications to further validate and optimize model performance.
CVJun 4, 2024
Enhance Image-to-Image Generation with LLaVA-generated PromptsZhicheng Ding, Panfeng Li, Qikai Yang et al.
This paper presents a novel approach to enhance image-to-image generation by leveraging the multimodal capabilities of the Large Language and Vision Assistant (LLaVA). We propose a framework where LLaVA analyzes input images and generates textual descriptions, hereinafter LLaVA-generated prompts. These prompts, along with the original image, are fed into the image-to-image generation pipeline. This enriched representation guides the generation process towards outputs that exhibit a stronger resemblance to the input image. Extensive experiments demonstrate the effectiveness of LLaVA-generated prompts in promoting image similarity. We observe a significant improvement in the visual coherence between the generated and input images compared to traditional methods. Future work will explore fine-tuning LLaVA prompts for increased control over the creative process. By providing more specific details within the prompts, we aim to achieve a delicate balance between faithfulness to the original image and artistic expression in the generated outputs.
CVFeb 2, 2019
Confidence Trigger Detection: Accelerating Real-time Tracking-by-detection SystemsZhicheng Ding, Zhixin Lai, Siyang Li et al.
Real-time object tracking necessitates a delicate balance between speed and accuracy, a challenge exacerbated by the computational demands of deep learning methods. In this paper, we propose Confidence-Triggered Detection (CTD), an innovative approach that strategically bypasses object detection for frames closely resembling intermediate states, leveraging tracker confidence scores. CTD not only enhances tracking speed but also preserves accuracy, surpassing existing tracking algorithms. Through extensive evaluation across various tracker confidence thresholds, we identify an optimal trade-off between tracking speed and accuracy, providing crucial insights for parameter fine-tuning and enhancing CTD's practicality in real-world scenarios. Our experiments across diverse detection models underscore the robustness and versatility of the CTD framework, demonstrating its potential to enable real-time tracking in resource-constrained environments.
CVOct 30, 2018
Contextual Hourglass Network for Semantic Segmentation of High Resolution Aerial ImageryPanfeng Li, Youzuo Lin, Emily Schultz-Fellenz
Semantic segmentation for aerial imagery is a challenging and important problem in remotely sensed imagery analysis. In recent years, with the success of deep learning, various convolutional neural network (CNN) based models have been developed. However, due to the varying sizes of the objects and imbalanced class labels, it can be challenging to obtain accurate pixel-wise semantic segmentation results. To address those challenges, we develop a novel semantic segmentation method and call it Contextual Hourglass Network. In our method, in order to improve the robustness of the prediction, we design a new contextual hourglass module which incorporates attention mechanism on processed low-resolution featuremaps to exploit the contextual semantics. We further exploit the stacked encoder-decoder structure by connecting multiple contextual hourglass modules from end to end. This architecture can effectively extract rich multi-scale features and add more feedback loops for better learning contextual semantics through intermediate supervision. To demonstrate the efficacy of our semantic segmentation method, we test it on Potsdam and Vaihingen datasets. Through the comparisons to other baseline methods, our method yields the best results on overall performance.