Haozhe Huang

CV
6papers
874citations
Novelty54%
AI Score34

6 Papers

CVJun 29, 2023
Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion

Weide Liu, Xiaoyang Zhong, Jingwen Hou et al.

Multimodal Named Entity Recognition (MNER) is a crucial task for information extraction from social media platforms such as Twitter. Most current methods rely on attention weights to extract information from both text and images but are often unreliable and lack interpretability. To address this problem, we propose incorporating uncertainty estimation into the MNER task, producing trustworthy predictions. Our proposed algorithm models the distribution of each modality as a Normal-inverse Gamma distribution, and fuses them into a unified distribution with an evidential fusion mechanism, enabling hierarchical characterization of uncertainties and promotion of prediction accuracy and trustworthiness. Additionally, we explore the potential of pre-trained large foundation models in MNER and propose an efficient fusion approach that leverages their robust feature representations. Experiments on two datasets demonstrate that our proposed method outperforms the baselines and achieves new state-of-the-art performance.

CVJun 20, 2021Code
Global and Local Contrastive Self-Supervised Learning for Semantic Segmentation of HR Remote Sensing Images

Haifeng Li, Yi Li, Guo Zhang et al.

Supervised learning for semantic segmentation requires a large number of labeled samples, which is difficult to obtain in the field of remote sensing. Self-supervised learning (SSL), can be used to solve such problems by pre-training a general model with a large number of unlabeled images and then fine-tuning it on a downstream task with very few labeled samples. Contrastive learning is a typical method of SSL that can learn general invariant features. However, most existing contrastive learning methods are designed for classification tasks to obtain an image-level representation, which may be suboptimal for semantic segmentation tasks requiring pixel-level discrimination. Therefore, we propose a global style and local matching contrastive learning network (GLCNet) for remote sensing image semantic segmentation. Specifically, 1) the global style contrastive learning module is used to better learn an image-level representation, as we consider that style features can better represent the overall image features. 2) The local features matching contrastive learning module is designed to learn representations of local regions, which is beneficial for semantic segmentation. The experimental results show that our method mostly outperforms SOTA self-supervised methods and the ImageNet pre-training method. Specifically, with 1\% annotation from the original dataset, our approach improves Kappa by 6\% on the ISPRS Potsdam dataset relative to the existing baseline. Moreover, our method outperforms supervised learning methods when there are some differences between the datasets of upstream tasks and downstream tasks. Since SSL could directly learn the essential characteristics of data from unlabeled data, which is easy to obtain in the remote sensing field, this may be of great significance for tasks such as global mapping. The source code is available at https://github.com/GeoX-Lab/G-RSIM.

CVApr 18, 2020Code
DAPnet: A Double Self-attention Convolutional Network for Point Cloud Semantic Labeling

Li Chen, Zewei Xu, Yongjian Fu et al.

Airborne Laser Scanning (ALS) point clouds have complex structures, and their 3D semantic labeling has been a challenging task. It has three problems: (1) the difficulty of classifying point clouds around boundaries of objects from different classes, (2) the diversity of shapes within the same class, and (3) the scale differences between classes. In this study, we propose a novel double self-attention convolutional network called the DAPnet. The double self-attention includes the point attention module (PAM) and the group attention module (GAM). For problem (1), the PAM can effectively assign different weights based on the relevance of point clouds in adjacent areas. Meanwhile, for the problem (2), the GAM enhances the correlation between groups, i.e., grouped features within the same classes. To solve the problem (3), we adopt a multiscale radius to construct the groups and concatenate extracted hierarchical features with the output of the corresponding upsampling process. Under the ISPRS 3D Semantic Labeling Contest dataset, the DAPnet outperforms the benchmark by 85.2\% with an overall accuracy of 90.7\%. By conducting ablation comparisons, we find that the PAM effectively improves the model than the GAM. The incorporation of the double self-attention module has an average of 7\% improvement on the pre-class accuracy. Plus, the DAPnet consumes a similar training time to those without the attention modules for model convergence. The DAPnet can assign different weights to features based on the relevance between point clouds and their neighbors, which effectively improves classification performance. The source codes are available at: https://github.com/RayleighChen/point-attention.

CVMar 7, 2020Code
DASNet: Dual attentive fully convolutional siamese networks for change detection of high resolution satellite images

Jie Chen, Ziyang Yuan, Jian Peng et al.

Change detection is a basic task of remote sensing image processing. The research objective is to identity the change information of interest and filter out the irrelevant change information as interference factors. Recently, the rise of deep learning has provided new tools for change detection, which have yielded impressive results. However, the available methods focus mainly on the difference information between multitemporal remote sensing images and lack robustness to pseudo-change information. To overcome the lack of resistance of current methods to pseudo-changes, in this paper, we propose a new method, namely, dual attentive fully convolutional Siamese networks (DASNet) for change detection in high-resolution images. Through the dual-attention mechanism, long-range dependencies are captured to obtain more discriminant feature representations to enhance the recognition performance of the model. Moreover, the imbalanced sample is a serious problem in change detection, i.e. unchanged samples are much more than changed samples, which is one of the main reasons resulting in pseudo-changes. We put forward the weighted double margin contrastive loss to address this problem by punishing the attention to unchanged feature pairs and increase attention to changed feature pairs. The experimental results of our method on the change detection dataset (CDD) and the building change detection dataset (BCDD) demonstrate that compared with other baseline methods, the proposed method realizes maximum improvements of 2.1\% and 3.6\%, respectively, in the F1 score. Our Pytorch implementation is available at https://github.com/lehaifeng/DASNet.

CVSep 28, 2020
RS-MetaNet: Deep meta metric learning for few-shot remote sensing scene classification

Haifeng Li, Zhenqi Cui, Zhiqing Zhu et al.

Training a modern deep neural network on massive labeled samples is the main paradigm in solving the scene classification problem for remote sensing, but learning from only a few data points remains a challenge. Existing methods for few-shot remote sensing scene classification are performed in a sample-level manner, resulting in easy overfitting of learned features to individual samples and inadequate generalization of learned category segmentation surfaces. To solve this problem, learning should be organized at the task level rather than the sample level. Learning on tasks sampled from a task family can help tune learning algorithms to perform well on new tasks sampled in that family. Therefore, we propose a simple but effective method, called RS-MetaNet, to resolve the issues related to few-shot remote sensing scene classification in the real world. On the one hand, RS-MetaNet raises the level of learning from the sample to the task by organizing training in a meta way, and it learns to learn a metric space that can well classify remote sensing scenes from a series of tasks. We also propose a new loss function, called Balance Loss, which maximizes the generalization ability of the model to new samples by maximizing the distance between different categories, providing the scenes in different categories with better linear segmentation planes while ensuring model fit. The experimental results on three open and challenging remote sensing datasets, UCMerced\_LandUse, NWPU-RESISC45, and Aerial Image Data, demonstrate that our proposed RS-MetaNet method achieves state-of-the-art results in cases where there are only 1-20 labeled samples.

CVJan 27, 2020
Convolution Neural Network Architecture Learning for Remote Sensing Scene Classification

Jie Chen, Haozhe Huang, Jian Peng et al.

Remote sensing image scene classification is a fundamental but challenging task in understanding remote sensing images. Recently, deep learning-based methods, especially convolutional neural network-based (CNN-based) methods have shown enormous potential to understand remote sensing images. CNN-based methods meet with success by utilizing features learned from data rather than features designed manually. The feature-learning procedure of CNN largely depends on the architecture of CNN. However, most of the architectures of CNN used for remote sensing scene classification are still designed by hand which demands a considerable amount of architecture engineering skills and domain knowledge, and it may not play CNN's maximum potential on a special dataset. In this paper, we proposed an automatically architecture learning procedure for remote sensing scene classification. We designed a parameters space in which every set of parameters represents a certain architecture of CNN (i.e., some parameters represent the type of operators used in the architecture such as convolution, pooling, no connection or identity, and the others represent the way how these operators connect). To discover the optimal set of parameters for a given dataset, we introduced a learning strategy which can allow efficient search in the architecture space by means of gradient descent. An architecture generator finally maps the set of parameters into the CNN used in our experiments.