IRApr 17Code
Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise RankingJun Feng, Jiahui Tang, Zhicheng He et al.
Adaptive Retrieval-Augmented Generation aims to mitigate the interference of extraneous noise by dynamically determining the necessity of retrieving supplementary passages. However, as Large Language Models evolve with increasing robustness to noise, the necessity of adaptive retrieval warrants re-evaluation. In this paper, we rethink this necessity and propose AdaRankLLM, a novel adaptive retrieval framework. To effectively verify the necessity of adaptive listwise reranking, we first develop an adaptive ranker employing a zero-shot prompt with a passage dropout mechanism, and compare its generation outcomes against static fixed-depth retrieval strategies. Furthermore, to endow smaller open-source LLMs with this precise listwise ranking and adaptive filtering capability, we introduce a two-stage progressive distillation paradigm enhanced by data sampling and augmentation techniques. Extensive experiments across three datasets and eight LLMs demonstrate that AdaRankLLM consistently achieves optimal performance in most scenarios with significantly reduced context overhead. Crucially, our analysis reveals a role shift in adaptive retrieval: it functions as a critical noise filter for weaker models to overcome their limitations, while serving as a cost-effective efficiency optimizer for stronger reasoning models.
CVJun 3, 2023
Unsupervised Low Light Image Enhancement Using SNR-Aware Swin TransformerZhijian Luo, Jiahui Tang, Yueen Hou et al.
Image captured under low-light conditions presents unpleasing artifacts, which debilitate the performance of feature extraction for many upstream visual tasks. Low-light image enhancement aims at improving brightness and contrast, and further reducing noise that corrupts the visual quality. Recently, many image restoration methods based on Swin Transformer have been proposed and achieve impressive performance. However, on one hand, trivially employing Swin Transformer for low-light image enhancement would expose some artifacts, including over-exposure, brightness imbalance and noise corruption, etc. On the other hand, it is impractical to capture image pairs of low-light images and corresponding ground-truth, i.e. well-exposed image in same visual scene. In this paper, we propose a dual-branch network based on Swin Transformer, guided by a signal-to-noise ratio prior map which provides the spatial-varying information for low-light image enhancement. Moreover, we leverage unsupervised learning to construct the optimization objective based on Retinex model, to guide the training of proposed network. Experimental results demonstrate that the proposed model is competitive with the baseline models.
IRAug 29, 2022
Time-aware Self-Attention Meets Logic Reasoning in Recommender SystemsZhijian Luo, Zihan Huang, Jiahui Tang et al.
At the age of big data, recommender systems have shown remarkable success as a key means of information filtering in our daily life. Recent years have witnessed the technical development of recommender systems, from perception learning to cognition reasoning which intuitively build the task of recommendation as the procedure of logical reasoning and have achieve significant improvement. However, the logical statement in reasoning implicitly admits irrelevance of ordering, even does not consider time information which plays an important role in many recommendation tasks. Furthermore, recommendation model incorporated with temporal context would tend to be self-attentive, i.e., automatically focus more (less) on the relevance (irrelevance), respectively. To address these issues, in this paper, we propose a Time-aware Self-Attention with Neural Collaborative Reasoning (TiSANCR) based recommendation model, which integrates temporal patterns and self-attention mechanism into reasoning-based recommendation. Specially, temporal patterns represented by relative time, provide context and auxiliary information to characterize the user's preference in recommendation, while self-attention is leveraged to distill informative patterns and suppress irrelevances. Therefore, the fusion of self-attentive temporal information provides deeper representation of user's preference. Extensive experiments on benchmark datasets demonstrate that the proposed TiSANCR achieves significant improvement and consistently outperforms the state-of-the-art recommendation methods.
CVJan 11, 2025
Natural Language Supervision for Low-light Image EnhancementJiahui Tang, Kaihua Zhou, Zhijian Luo et al.
With the development of deep learning, numerous methods for low-light image enhancement (LLIE) have demonstrated remarkable performance. Mainstream LLIE methods typically learn an end-to-end mapping based on pairs of low-light and normal-light images. However, normal-light images under varying illumination conditions serve as reference images, making it difficult to define a ``perfect'' reference image This leads to the challenge of reconciling metric-oriented and visual-friendly results. Recently, many cross-modal studies have found that side information from other related modalities can guide visual representation learning. Based on this, we introduce a Natural Language Supervision (NLS) strategy, which learns feature maps from text corresponding to images, offering a general and flexible interface for describing an image under different illumination. However, image distributions conditioned on textual descriptions are highly multimodal, which makes training difficult. To address this issue, we design a Textual Guidance Conditioning Mechanism (TCM) that incorporates the connections between image regions and sentence words, enhancing the ability to capture fine-grained cross-modal cues for images and text. This strategy not only utilizes a wider range of supervised sources, but also provides a new paradigm for LLIE based on visual and textual feature alignment. In order to effectively identify and merge features from various levels of image and textual information, we design an Information Fusion Attention (IFA) module to enhance different regions at different levels. We integrate the proposed TCM and IFA into a Natural Language Supervision network for LLIE, named NaLSuper. Finally, extensive experiments demonstrate the robustness and superior effectiveness of our proposed NaLSuper.