CVApr 28, 2021
LGA-RCNN: Loss-Guided Attention for Object DetectionXin Yi, Jiahao Wu, Bo Ma et al.
Object detection is widely studied in computer vision filed. In recent years, certain representative deep learning based detection methods along with solid benchmarks are proposed, which boosts the development of related researchs. However, existing detection methods still suffer from undesirable performance under challenges such as camouflage, blur, inter-class similarity, intra-class variance and complex environment. To address this issue, we propose LGA-RCNN which utilizes a loss-guided attention (LGA) module to highlight representative region of objects. Then, those highlighted local information are fused with global information for precise classification and localization.
CVApr 19, 2021
Self-Paced Uncertainty Estimation for One-shot Person Re-IdentificationYulin Zhang, Bo Ma, Longyao Liu et al.
The one-shot Person Re-ID scenario faces two kinds of uncertainties when constructing the prediction model from $X$ to $Y$. The first is model uncertainty, which captures the noise of the parameters in DNNs due to a lack of training data. The second is data uncertainty, which can be divided into two sub-types: one is image noise, where severe occlusion and the complex background contain irrelevant information about the identity; the other is label noise, where mislabeled affects visual appearance learning. In this paper, to tackle these issues, we propose a novel Self-Paced Uncertainty Estimation Network (SPUE-Net) for one-shot Person Re-ID. By introducing a self-paced sampling strategy, our method can estimate the pseudo-labels of unlabeled samples iteratively to expand the labeled samples gradually and remove model uncertainty without extra supervision. We divide the pseudo-label samples into two subsets to make the use of training samples more reasonable and effective. In addition, we apply a Co-operative learning method of local uncertainty estimation combined with determinacy estimation to achieve better hidden space feature mining and to improve the precision of selected pseudo-labeled samples, which reduces data uncertainty. Extensive comparative evaluation experiments on video-based and image-based datasets show that SPUE-Net has significant advantages over the state-of-the-art methods.
CVFeb 6, 2021
Two-Step Image Dehazing with Intra-domain and Inter-domain AdaptationXin Yi, Bo Ma, Yulin Zhang et al.
Caused by the difference of data distributions, intra-domain gap and inter-domain gap are widely present in image processing tasks. In the field of image dehazing, certain previous works have paid attention to the inter-domain gap between the synthetic domain and the real domain. However, those methods only establish the connection from the source domain to the target domain without taking into account the large distribution shift within the target domain (intra-domain gap). In this work, we propose a Two-Step Dehazing Network (TSDN) with an intra-domain adaptation and a constrained inter-domain adaptation. First, we subdivide the distributions within the synthetic domain into subsets and mine the optimal subset (easy samples) by loss-based supervision. To alleviate the intra-domain gap of the synthetic domain, we propose an intra-domain adaptation to align distributions of other subsets to the optimal subset by adversarial learning. Finally, we conduct the constrained inter-domain adaptation from the real domain to the optimal subset of the synthetic domain, alleviating the domain shift between domains as well as the distribution shift within the real domain. Extensive experimental results demonstrate that our framework performs favorably against the state-of-the-art algorithms both on the synthetic datasets and the real datasets.
CVNov 30, 2020
AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object DetectionLongyao Liu, Bo Ma, Yulin Zhang et al.
Few-shot object detection (FSOD) aims at learning a detector that can fast adapt to previously unseen objects with scarce annotated examples, which is challenging and demanding. Existing methods solve this problem by performing subtasks of classification and localization utilizing a shared component (e.g., RoI head) in the detector, yet few of them take the distinct preferences of two subtasks towards feature embedding into consideration. In this paper, we carefully analyze the characteristics of FSOD, and present that a general few-shot detector should consider the explicit decomposition of two subtasks, as well as leveraging information from both of them to enhance feature representations. To the end, we propose a simple yet effective Adaptive Fully-Dual Network (AFD-Net). Specifically, we extend Faster R-CNN by introducing Dual Query Encoder and Dual Attention Generator for separate feature extraction, and Dual Aggregator for separate model reweighting. Spontaneously, separate state estimation is achieved by the R-CNN detector. Besides, for the acquisition of enhanced feature representations, we further introduce Adaptive Fusion Mechanism to adaptively perform feature fusion in different subtasks. Extensive experiments on PASCAL VOC and MS COCO in various settings show that, our method achieves new state-of-the-art performance by a large margin, demonstrating its effectiveness and generalization ability.