Zhuo Tao

CV
h-index18
3papers
10citations
Novelty42%
AI Score24

3 Papers

CVNov 12, 2024
AdaSemiCD: An Adaptive Semi-Supervised Change Detection Method Based on Pseudo-Label Evaluation

Ran Lingyan, Wen Dongcheng, Zhuo Tao et al.

Change Detection (CD) is an essential field in remote sensing, with a primary focus on identifying areas of change in bi-temporal image pairs captured at varying intervals of the same region by a satellite. The data annotation process for the CD task is both time-consuming and labor-intensive. To make better use of the scarce labeled data and abundant unlabeled data, we present an adaptive dynamic semi-supervised learning method, AdaSemiCD, to improve the use of pseudo-labels and optimize the training process. Initially, due to the extreme class imbalance inherent in CD, the model is more inclined to focus on the background class, and it is easy to confuse the boundary of the target object. Considering these two points, we develop a measurable evaluation metric for pseudo-labels that enhances the representation of information entropy by class rebalancing and amplification of confusing areas to give a larger weight to prospects change objects. Subsequently, to enhance the reliability of sample-wise pseudo-labels, we introduce the AdaFusion module, which is capable of dynamically identifying the most uncertain region and substituting it with more trustworthy content. Lastly, to ensure better training stability, we introduce the AdaEMA module, which updates the teacher model using only batches of trusted samples. Experimental results from LEVIR-CD, WHU-CD, and CDD datasets validate the efficacy and universality of our proposed adaptive training framework.

CVMar 22, 2025
Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization

Zhuo Tao, Liang Li, Qi Chen et al.

Natural language video localization (NLVL) is a crucial task in video understanding that aims to localize the target moment in videos specified by a given language description. Recently, a point-supervised paradigm has been presented to address this task, requiring only a single annotated frame within the target moment rather than complete temporal boundaries. Compared with the fully-supervised paradigm, it offers a balance between localization accuracy and annotation cost. However, due to the absence of complete annotation, it is challenging to align the video content with language descriptions, consequently hindering accurate moment prediction. To address this problem, we propose a new COllaborative Temporal consistEncy Learning (COTEL) framework that leverages the synergy between saliency detection and moment localization to strengthen the video-language alignment. Specifically, we first design a frame- and a segment-level Temporal Consistency Learning (TCL) module that models semantic alignment across frame saliencies and sentence-moment pairs. Then, we design a cross-consistency guidance scheme, including a Frame-level Consistency Guidance (FCG) and a Segment-level Consistency Guidance (SCG), that enables the two temporal consistency learning paths to reinforce each other mutually. Further, we introduce a Hierarchical Contrastive Alignment Loss (HCAL) to comprehensively align the video and text query. Extensive experiments on two benchmarks demonstrate that our method performs favorably against SoTA approaches. We will release all the source codes.

LGApr 7, 2021
Generating Multi-type Temporal Sequences to Mitigate Class-imbalanced Problem

Lun Jiang, Nima Salehi Sadghiani, Zhuo Tao et al.

From the ad network standpoint, a user's activity is a multi-type sequence of temporal events consisting of event types and time intervals. Understanding user patterns in ad networks has received increasing attention from the machine learning community. Particularly, the problems of fraud detection, Conversion Rate (CVR), and Click-Through Rate (CTR) prediction are of interest. However, the class imbalance between major and minor classes in these tasks can bias a machine learning model leading to poor performance. This study proposes using two multi-type (continuous and discrete) training approaches for GANs to deal with the limitations of traditional GANs in passing the gradient updates for discrete tokens. First, we used the Reinforcement Learning (RL)-based training approach and then, an approximation of the multinomial distribution parameterized in terms of the softmax function (Gumble-Softmax). Our extensive experiments based on synthetic data have shown the trained generator can generate sequences with desired properties measured by multiple criteria.