CLJul 11, 2024
RB-SQL: A Retrieval-based LLM Framework for Text-to-SQLZhenhe Wu, Zhongqiu Li, Jie Zhang et al.
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting valuable information for more efficient prompt engineering. Based on above analysis, we propose RB-SQL, a novel retrieval-based LLM framework for in-context prompt engineering, which consists of three modules that retrieve concise tables and columns as schema, and targeted examples for in-context learning. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
31.0CLApr 17
Beyond Static Personas: Situational Personality Steering for Large Language ModelsZesheng Wei, Mengxiang Li, Zilei Wang et al.
Personalized Large Language Models (LLMs) facilitate more natural, human-like interactions in human-centric applications. However, existing personalization methods are constrained by limited controllability and high resource demands. Furthermore, their reliance on static personality modeling restricts adaptability across varying situations. To address these limitations, we first demonstrate the existence of situation-dependency and consistent situation-behavior patterns within LLM personalities through a multi-perspective analysis of persona neurons. Building on these insights, we propose IRIS, a training-free, neuron-based Identify-Retrieve-Steer framework for advanced situational personality steering. Our approach comprises situational persona neuron identification, situation-aware neuron retrieval, and similarity-weighted steering. We empirically validate our framework on PersonalityBench and our newly introduced SPBench, a comprehensive situational personality benchmark. Experimental results show that our method surpasses best-performing baselines, demonstrating IRIS's generalization and robustness to complex, unseen situations and different models architecture.
CLMar 16, 2024
Towards Robustness and Diversity: Continual Learning in Dialog Generation with Text-Mixup and Batch Nuclear-Norm MaximizationZihan Wang, Jiayu Xiao, Mengxiang Li et al.
In our dynamic world where data arrives in a continuous stream, continual learning enables us to incrementally add new tasks/domains without the need to retrain from scratch. A major challenge in continual learning of language model is catastrophic forgetting, the tendency of models to forget knowledge from previously trained tasks/domains when training on new ones. This paper studies dialog generation under the continual learning setting. We propose a novel method that 1) uses \textit{Text-Mixup} as data augmentation to avoid model overfitting on replay memory and 2) leverages Batch-Nuclear Norm Maximization (BNNM) to alleviate the problem of mode collapse. Experiments on a $37$-domain task-oriented dialog dataset and DailyDialog (a $10$-domain chitchat dataset) demonstrate that our proposed approach outperforms the state-of-the-art in continual learning.
CVSep 17, 2025
Morphology-optimized Multi-Scale Fusion: Combining Local Artifacts and Mesoscopic Semantics for Deepfake Detection and LocalizationChao Shuai, Gaojian Wang, Kun Pan et al.
While the pursuit of higher accuracy in deepfake detection remains a central goal, there is an increasing demand for precise localization of manipulated regions. Despite the remarkable progress made in classification-based detection, accurately localizing forged areas remains a significant challenge. A common strategy is to incorporate forged region annotations during model training alongside manipulated images. However, such approaches often neglect the complementary nature of local detail and global semantic context, resulting in suboptimal localization performance. Moreover, an often-overlooked aspect is the fusion strategy between local and global predictions. Naively combining the outputs from both branches can amplify noise and errors, thereby undermining the effectiveness of the localization. To address these issues, we propose a novel approach that independently predicts manipulated regions using both local and global perspectives. We employ morphological operations to fuse the outputs, effectively suppressing noise while enhancing spatial coherence. Extensive experiments reveal the effectiveness of each module in improving the accuracy and robustness of forgery localization.