CLSep 23, 2024
Enhancing Aspect-based Sentiment Analysis in Tourism Using Large Language Models and Positional InformationChun Xu, Mengmeng Wang, Yan Ren et al.
Aspect-Based Sentiment Analysis (ABSA) in tourism plays a significant role in understanding tourists' evaluations of specific aspects of attractions, which is crucial for driving innovation and development in the tourism industry. However, traditional pipeline models are afflicted by issues such as error propagation and incomplete extraction of sentiment elements. To alleviate this issue, this paper proposes an aspect-based sentiment analysis model, ACOS_LLM, for Aspect-Category-Opinion-Sentiment Quadruple Extraction (ACOSQE). The model comprises two key stages: auxiliary knowledge generation and ACOSQE. Firstly, Adalora is used to fine-tune large language models for generating high-quality auxiliary knowledge. To enhance model efficiency, Sparsegpt is utilized to compress the fine-tuned model to 50% sparsity. Subsequently, Positional information and sequence modeling are employed to achieve the ACOSQE task, with auxiliary knowledge and the original text as inputs. Experiments are conducted on both self-created tourism datasets and publicly available datasets, Rest15 and Rest16. Results demonstrate the model's superior performance, with an F1 improvement of 7.49% compared to other models on the tourism dataset. Additionally, there is an F1 improvement of 0.05% and 1.06% on the Rest15 and Rest16 datasets, respectively.
SDJul 19, 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2Chun Xu, En-Wei Sun
An increasing number of Chinese people are troubled by different degrees of visual impairment, which has made the modal conversion between a single image or video frame in the visual field and the audio expressing the same information a research hotspot. Deep learning technologies such as OCR+Vocoder and Im2Wav enable English audio synthesis or image-to-sound matching in a self-supervised manner. However, the audio data used for training is limited and English is not universal for visually impaired people with different educational levels. Therefore, for the sake of solving the problems of data volume and language applicability to improve the reading efficiency of visually impaired people, a set of image-to-speech framework CLIP-KNN-Fastspeech2 based on the Chinese context was constructed. The framework integrates multiple basic models and adopts the strategy of independent pre-training and joint fine-tuning. First, the Chinese CLIP and Fastspeech2 text-to-speech models were pre-trained on two public datasets, MUGE and Baker, respectively, and their convergence was verified. Subsequently, joint fine-tuning was performed using a self-built Braille image dataset. Experimental results on multiple public datasets such as VGGSound, Flickr8k, ImageHear, and the self-built Braille dataset BIT-DP show that the model has improved objective indicators such as BLEU4,FAD(Fréchet Audio Distance), WER(Word Error Ratio), and even inference speed. This verifies that the constructed model still has the ability to synthesize high-quality speech under limited data, and also proves the effectiveness of the joint training strategy that integrates multiple basic models.
LGSep 21, 2025
Adaptive Overclocking: Dynamic Control of Thinking Path Length via Real-Time Reasoning SignalsShuhao Jiang, Songbo Wang, Yang Qiao et al.
Large Reasoning Models (LRMs) often suffer from computational inefficiency due to overthinking, where a fixed reasoning budget fails to match the varying complexity of tasks. To address this issue, we propose Adaptive Overclocking, a method that makes the overclocking hyperparameter $α$ dynamic and context-aware. Our method adjusts reasoning speed in real time through two complementary signals: (1) token-level model uncertainty for fine-grained step-wise control, and (2) input complexity estimation for informed initialization. We implement this approach with three strategies: Uncertainty-Aware Alpha Scheduling (UA-$α$S), Complexity-Guided Alpha Initialization (CG-$α$I), and a Hybrid Adaptive Control (HAC) that combines both. Experiments on GSM8K, MATH, and SVAMP show that HAC achieves superior accuracy-latency trade-offs, reducing unnecessary computation on simple problems while allocating more resources to challenging ones. By mitigating overthinking, Adaptive Overclocking enhances both efficiency and overall reasoning performance.