CLMay 15, 2022
Adaptive Prompt Learning-based Few-Shot Sentiment AnalysisPengfei Zhang, Tingting Chai, Yongdong Xu
In the field of natural language processing, sentiment analysis via deep learning has a excellent performance by using large labeled datasets. Meanwhile, labeled data are insufficient in many sentiment analysis, and obtaining these data is time-consuming and laborious. Prompt learning devotes to resolving the data deficiency by reformulating downstream tasks with the help of prompt. In this way, the appropriate prompt is very important for the performance of the model. This paper proposes an adaptive prompting(AP) construction strategy using seq2seq-attention structure to acquire the semantic information of the input sequence. Then dynamically construct adaptive prompt which can not only improve the quality of the prompt, but also can effectively generalize to other fields by pre-trained prompt which is constructed by existing public labeled data. The experimental results on FewCLUE datasets demonstrate that the proposed method AP can effectively construct appropriate adaptive prompt regardless of the quality of hand-crafted prompt and outperform the state-of-the-art baselines.
CVJul 4, 2025Code
Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional VideosYufan Zhou, Zhaobo Qi, Lingshuai Lin et al.
In this paper, we address the challenge of procedure planning in instructional videos, aiming to generate coherent and task-aligned action sequences from start and end visual observations. Previous work has mainly relied on text-level supervision to bridge the gap between observed states and unobserved actions, but it struggles with capturing intricate temporal relationships among actions. Building on these efforts, we propose the Masked Temporal Interpolation Diffusion (MTID) model that introduces a latent space temporal interpolation module within the diffusion model. This module leverages a learnable interpolation matrix to generate intermediate latent features, thereby augmenting visual supervision with richer mid-state details. By integrating this enriched supervision into the model, we enable end-to-end training tailored to task-specific requirements, significantly enhancing the model's capacity to predict temporally coherent action sequences. Additionally, we introduce an action-aware mask projection mechanism to restrict the action generation space, combined with a task-adaptive masked proximity loss to prioritize more accurate reasoning results close to the given start and end states over those in intermediate steps. Simultaneously, it filters out task-irrelevant action predictions, leading to contextually aware action sequences. Experimental results across three widely used benchmark datasets demonstrate that our MTID achieves promising action planning performance on most metrics. The code is available at https://github.com/WiserZhou/MTID.
IVMay 5, 2025
Multi-View Learning with Context-Guided Receptance for Image DenoisingBinghong Chen, Tingting Chai, Wei Jiang et al.
Image denoising is essential in low-level vision applications such as photography and automated driving. Existing methods struggle with distinguishing complex noise patterns in real-world scenes and consume significant computational resources due to reliance on Transformer-based models. In this work, the Context-guided Receptance Weighted Key-Value (\M) model is proposed, combining enhanced multi-view feature integration with efficient sequence modeling. Our approach introduces the Context-guided Token Shift (CTS) paradigm, which effectively captures local spatial dependencies and enhance the model's ability to model real-world noise distributions. Additionally, the Frequency Mix (FMix) module extracting frequency-domain features is designed to isolate noise in high-frequency spectra, and is integrated with spatial representations through a multi-view learning process. To improve computational efficiency, the Bidirectional WKV (BiWKV) mechanism is adopted, enabling full pixel-sequence interaction with linear complexity while overcoming the causal selection constraints. The model is validated on multiple real-world image denoising datasets, outperforming the existing state-of-the-art methods quantitatively and reducing inference time up to 40\%. Qualitative results further demonstrate the ability of our model to restore fine details in various scenes.
CVNov 28, 2019
Palmprint Recognition in Uncontrolled and Uncooperative EnvironmentWojciech Michal Matkowski, Tingting Chai, Adams Wai Kin Kong
Online palmprint recognition and latent palmprint identification are two branches of palmprint studies. The former uses middle-resolution images collected by a digital camera in a well-controlled or contact-based environment with user cooperation for commercial applications and the latter uses high-resolution latent palmprints collected in crime scenes for forensic investigation. However, these two branches do not cover some palmprint images which have the potential for forensic investigation. Due to the prevalence of smartphone and consumer camera, more evidence is in the form of digital images taken in uncontrolled and uncooperative environment, e.g., child pornographic images and terrorist images, where the criminals commonly hide or cover their face. However, their palms can be observable. To study palmprint identification on images collected in uncontrolled and uncooperative environment, a new palmprint database is established and an end-to-end deep learning algorithm is proposed. The new database named NTU Palmprints from the Internet (NTU-PI-v1) contains 7881 images from 2035 palms collected from the Internet. The proposed algorithm consists of an alignment network and a feature extraction network and is end-to-end trainable. The proposed algorithm is compared with the state-of-the-art online palmprint recognition methods and evaluated on three public contactless palmprint databases, IITD, CASIA, and PolyU and two new databases, NTU-PI-v1 and NTU contactless palmprint database. The experimental results showed that the proposed algorithm outperforms the existing palmprint recognition methods.