Jiantao Wu

LG
h-index31
9papers
43citations
Novelty51%
AI Score40

9 Papers

CVAug 22, 2023
Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding

Jiantao Wu, Shentong Mo, Muhammad Awais et al.

Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data. In the realm of computer vision, pretrained vision transformers (ViTs) have played a pivotal role in advancing transfer learning. Nonetheless, the escalating cost of finetuning these large models has posed a challenge due to the explosion of model size. This study endeavours to evaluate the effectiveness of pure self-supervised learning (SSL) techniques in computer vision tasks, obviating the need for finetuning, with the intention of emulating human-like capabilities in generalisation and recognition of unseen objects. To this end, we propose an evaluation protocol for zero-shot segmentation based on a prompting patch. Given a point on the target object as a prompt, the algorithm calculates the similarity map between the selected patch and other patches, upon that, a simple thresholding is applied to segment the target. Another evaluation is intra-object and inter-object similarity to gauge discriminatory ability of SSP ViTs. Insights from zero-shot segmentation from prompting and discriminatory abilities of SSP led to the design of a simple SSP approach, termed MMC. This approaches combines Masked image modelling for encouraging similarity of local features, Momentum based self-distillation for transferring semantics from global to local features, and global Contrast for promoting semantics of global features, to enhance discriminative representations of SSP ViTs. Consequently, our proposed method significantly reduces the overlap of intra-object and inter-object similarities, thereby facilitating effective object segmentation within an image. Our experiments reveal that MMC delivers top-tier results in zero-shot semantic segmentation across various datasets.

LGMar 22, 2023Code
Variantional autoencoder with decremental information bottleneck for disentanglement

Jiantao Wu, Shentong Mo, Xiang Yang et al.

One major challenge of disentanglement learning with variational autoencoders is the trade-off between disentanglement and reconstruction fidelity. Previous studies, which increase the information bottleneck during training, tend to lose the constraint of disentanglement, leading to the information diffusion problem. In this paper, we present a novel framework for disentangled representation learning, DeVAE, which utilizes hierarchical latent spaces with decreasing information bottlenecks across these spaces. The key innovation of our approach lies in connecting the hierarchical latent spaces through disentanglement-invariant transformations, allowing the sharing of disentanglement properties among spaces while maintaining an acceptable level of reconstruction performance. We demonstrate the effectiveness of DeVAE in achieving a balance between disentanglement and reconstruction through a series of experiments and ablation studies on dSprites and Shapes3D datasets. Code is available at https://github.com/erow/disentanglement_lib/tree/pytorch#devae.

CVMay 28, 2022
Object-wise Masked Autoencoders for Fast Pre-training

Jiantao Wu, Shentong Mo

Self-supervised pre-training for images without labels has recently achieved promising performance in image classification. The success of transformer-based methods, ViT and MAE, draws the community's attention to the design of backbone architecture and self-supervised task. In this work, we show that current masked image encoding models learn the underlying relationship between all objects in the whole scene, instead of a single object representation. Therefore, those methods bring a lot of compute time for self-supervised pre-training. To solve this issue, we introduce a novel object selection and division strategy to drop non-object patches for learning object-wise representations by selective reconstruction with interested region masks. We refer to this method ObjMAE. Extensive experiments on four commonly-used datasets demonstrate the effectiveness of our model in reducing the compute cost by 72% while achieving competitive performance. Furthermore, we investigate the inter-object and intra-object relationship and find that the latter is crucial for self-supervised pre-training.

AIAug 18, 2022
A semantic web approach to uplift decentralized household energy data

Jiantao Wu, Fabrizio Orlandi, Tarek AlSkaif et al.

In a decentralized household energy system comprised of various devices such as home appliances, electric vehicles, and solar panels, end-users are able to dig deeper into the system's details and further achieve energy sustainability if they are presented with data on the electric energy consumption and production at the granularity of the device. However, many databases in this field are siloed from other domains, including solely information pertaining to energy. This may result in the loss of information (e.g. weather) on each device's energy use. Meanwhile, a large number of these datasets have been extensively used in computational modeling techniques such as machine learning models. While such computational approaches achieve great accuracy and performance by concentrating only on a local view of datasets, model reliability cannot be guaranteed since such models are very vulnerable to data input fluctuations when information omission is taken into account. This article tackles the data isolation issue in the field of smart energy systems by examining Semantic Web methods on top of a household energy system. We offer an ontology-based approach for managing decentralized data at the device-level resolution in a system. As a consequence, the scope of the data associated with each device may easily be expanded in an interoperable manner throughout the Web, and additional information, such as weather, can be obtained from the Web, provided that the data is organized according to W3C standards.

LGJan 22Code
Rethinking Drug-Drug Interaction Modeling as Generalizable Relation Learning

Dong Xu, Jiantao Wu, Qihua Pan et al.

Drug-drug interaction (DDI) prediction is central to drug discovery and clinical development, particularly in the context of increasingly prevalent polypharmacy. Although existing computational methods achieve strong performance on standard benchmarks, they often fail to generalize to realistic deployment scenarios, where most candidate drug pairs involve previously unseen drugs and validated interactions are scarce. We demonstrate that proximity in the embedding spaces of prevailing molecule-centric DDI models does not reliably correspond to interaction labels, and that simply scaling up model capacity therefore fails to improve generalization. To address these limitations, we propose GenRel-DDI, a generalizable relation learning framework that reformulates DDI prediction as a relation-centric learning problem, in which interaction representations are learned independently of drug identities. This relation-level abstraction enables the capture of transferable interaction patterns that generalize to unseen drugs and novel drug pairs. Extensive experiments across multiple benchmark demonstrate that GenRel-DDI consistently and significantly outperforms state-of-the-art methods, with particularly large gains on strict entity-disjoint evaluations, highlighting the effectiveness and practical utility of relation learning for robust DDI prediction. The code is available at https://github.com/SZU-ADDG/GenRel-DDI.

LGMar 31, 2024Code
DailyMAE: Towards Pretraining Masked Autoencoders in One Day

Jiantao Wu, Shentong Mo, Sara Atito et al.

Recently, masked image modeling (MIM), an important self-supervised learning (SSL) method, has drawn attention for its effectiveness in learning data representation from unlabeled data. Numerous studies underscore the advantages of MIM, highlighting how models pretrained on extensive datasets can enhance the performance of downstream tasks. However, the high computational demands of pretraining pose significant challenges, particularly within academic environments, thereby impeding the SSL research progress. In this study, we propose efficient training recipes for MIM based SSL that focuses on mitigating data loading bottlenecks and employing progressive training techniques and other tricks to closely maintain pretraining performance. Our library enables the training of a MAE-Base/16 model on the ImageNet 1K dataset for 800 epochs within just 18 hours, using a single machine equipped with 8 A100 GPUs. By achieving speed gains of up to 5.8 times, this work not only demonstrates the feasibility of conducting high-efficiency SSL training but also paves the way for broader accessibility and promotes advancement in SSL research particularly for prototyping and initial testing of SSL ideas. The code is available in https://github.com/erow/FastSSL.

CVOct 23, 2024
Rethinking Positive Pairs in Contrastive Learning

Jiantao Wu, Sara Atito, Zhenhua Feng et al.

The training methods in AI do involve semantically distinct pairs of samples. However, their role typically is to enhance the between class separability. The actual notion of similarity is normally learned from semantically identical pairs. This paper presents SimLAP: a simple framework for learning visual representation from arbitrary pairs. SimLAP explores the possibility of learning similarity from semantically distinct sample pairs. The approach is motivated by the observation that for any pair of classes there exists a subspace in which semantically distinct samples exhibit similarity. This phenomenon can be exploited for a novel method of learning, which optimises the similarity of an arbitrary pair of samples, while simultaneously learning the enabling subspace. The feasibility of the approach will be demonstrated experimentally and its merits discussed.

LGFeb 8, 2021
DEFT: Distilling Entangled Factors by Preventing Information Diffusion

Jiantao Wu, Lin Wang, Bo Yang et al.

Disentanglement is a highly desirable property of representation owing to its similarity to human understanding and reasoning. Many works achieve disentanglement upon information bottlenecks (IB). Despite their elegant mathematical foundations, the IB branch usually exhibits lower performance. In order to provide an insight into the problem, we develop an annealing test to calculate the information freezing point (IFP), which is a transition state to freeze information into the latent variables. We also explore these clues or inductive biases for separating the entangled factors according to the differences in the IFP distributions. We found the existing approaches suffer from the information diffusion problem, according to which the increased information diffuses in all latent variables. Based on this insight, we propose a novel disentanglement framework, termed the distilling entangled factor (DEFT), to address the information diffusion problem by scaling backward information. DEFT applies a multistage training strategy, including multigroup encoders with different learning rates and piecewise disentanglement pressure, to disentangle the factors stage by stage. We evaluate DEFT on three variants of dSprite and SmallNORB, which show low-variance and high-level disentanglement scores. Furthermore, the experiment under the correlative factors shows incapable of TC-based approaches. DEFT also exhibits a competitive performance in the unsupervised setting.

LGOct 17, 2020
Disentangling Action Sequences: Discovering Correlated Samples

Jiantao Wu, Lin Wang

Disentanglement is a highly desirable property of representation due to its similarity with human's understanding and reasoning. This improves interpretability, enables the performance of down-stream tasks, and enables controllable generative models. However, this domain is challenged by the abstract notion and incomplete theories to support unsupervised disentanglement learning. We demonstrate the data itself, such as the orientation of images, plays a crucial role in disentanglement and instead of the factors, and the disentangled representations align the latent variables with the action sequences. We further introduce the concept of disentangling action sequences which facilitates the description of the behaviours of the existing disentangling approaches. An analogy for this process is to discover the commonality between the things and categorizing them. Furthermore, we analyze the inductive biases on the data and find that the latent information thresholds are correlated with the significance of the actions. For the supervised and unsupervised settings, we respectively introduce two methods to measure the thresholds. We further propose a novel framework, fractional variational autoencoder (FVAE), to disentangle the action sequences with different significance step-by-step. Experimental results on dSprites and 3D Chairs show that FVAE improves the stability of disentanglement.