CVMay 3, 2017Code
Unsupervised Part-based Weighting Aggregation of Deep Convolutional Features for Image RetrievalJian Xu, Cunzhao Shi, Chengzuo Qi et al.
In this paper, we propose a simple but effective semantic part-based weighting aggregation (PWA) for image retrieval. The proposed PWA utilizes the discriminative filters of deep convolutional layers as part detectors. Moreover, we propose the effective unsupervised strategy to select some part detectors to generate the "probabilistic proposals", which highlight certain discriminative parts of objects and suppress the noise of background. The final global PWA representation could then be acquired by aggregating the regional representations weighted by the selected "probabilistic proposals" corresponding to various semantic content. We conduct comprehensive experiments on four standard datasets and show that our unsupervised PWA outperforms the state-of-the-art unsupervised and supervised aggregation methods. Code is available at https://github.com/XJhaoren/PWA.
CVFeb 28, 2020
DGST : Discriminator Guided Scene Text detectorJinyuan Zhao, Yanna Wang, Baihua Xiao et al.
Scene text detection task has attracted considerable attention in computer vision because of its wide application. In recent years, many researchers have introduced methods of semantic segmentation into the task of scene text detection, and achieved promising results. This paper proposes a detector framework based on the conditional generative adversarial networks to improve the segmentation effect of scene text detection, called DGST (Discriminator Guided Scene Text detector). Instead of binary text score maps generated by some existing semantic segmentation based methods, we generate a multi-scale soft text score map with more information to represent the text position more reasonably, and solve the problem of text pixel adhesion in the process of text extraction. Experiments on standard datasets demonstrate that the proposed DGST brings noticeable gain and outperforms state-of-the-art methods. Specifically, it achieves an F-measure of 87% on ICDAR 2015 dataset.
CVNov 19, 2018
Adversarial Soft-detection-based Aggregation Network for Image RetrievalJian Xu, Chunheng Wang, Cunzhao Shi et al.
In recent year, the compact representations based on activations of Convolutional Neural Network (CNN) achieve remarkable performance in image retrieval. However, retrieval of some interested object that only takes up a small part of the whole image is still a challenging problem. Therefore, it is significant to extract the discriminative representations that contain regional information of the pivotal small object. In this paper, we propose a novel adversarial soft-detection-based aggregation (ASDA) method free from bounding box annotations for image retrieval, based on adversarial detector and soft region proposal layer. Our trainable adversarial detector generates semantic maps based on adversarial erasing strategy to preserve more discriminative and detailed information. Computed based on semantic maps corresponding to various discriminative patterns and semantic contents, our soft region proposal is arbitrary shape rather than only rectangle and it reflects the significance of objects. The aggregation based on trainable soft region proposal highlights discriminative semantic contents and suppresses the noise of background. We conduct comprehensive experiments on standard image retrieval datasets. Our weakly supervised ASDA method achieves state-of-the-art performance on most datasets. The results demonstrate that the proposed ASDA method is effective for image retrieval.
CVNov 15, 2018
Selective Feature Connection Mechanism: Concatenating Multi-layer CNN Features with a Feature SelectorChen Du, Chunheng Wang, Yanna Wang et al.
Different layers of deep convolutional neural networks(CNNs) can encode different-level information. High-layer features always contain more semantic information, and low-layer features contain more detail information. However, low-layer features suffer from the background clutter and semantic ambiguity. During visual recognition, the feature combination of the low-layer and high-level features plays an important role in context modulation. If directly combining the high-layer and low-layer features, the background clutter and semantic ambiguity may be caused due to the introduction of detailed information. In this paper, we propose a general network architecture to concatenate CNN features of different layers in a simple and effective way, called Selective Feature Connection Mechanism (SFCM). Low-level features are selectively linked to high-level features with a feature selector which is generated by high-level features. The proposed connection mechanism can effectively overcome the above-mentioned drawbacks. We demonstrate the effectiveness, superiority, and universal applicability of this method on multiple challenging computer vision tasks, including image classification, scene text detection, and image-to-image translation.
CVJun 29, 2018
Excavate Condition-invariant Space by Intrinsic EncoderJian Xu, Chunheng Wang, Cunzhao Shi et al.
As the human, we can recognize the places across a wide range of changing environmental conditions such as those caused by weathers, seasons, and day-night cycles. We excavate and memorize the stable semantic structure of different places and scenes. For example, we can recognize tree whether the bare tree in winter or lush tree in summer. Therefore, the intrinsic features that are corresponding to specific semantic contents and condition-invariant of appearance changes can be employed to improve the performance of long-term place recognition significantly. In this paper, we propose a novel intrinsic encoder that excavates the condition-invariant latent space of different places under drastic appearance changes. Our method excavates the space of intrinsic structure and semantic information by proposed self-supervised encoder loss. Different from previous learning based place recognition methods that need paired training data of each place with appearance changes, we employ the weakly-supervised strategy to utilize unpaired set-based training data of different environmental conditions. We conduct comprehensive experiments and show that our semi-supervised intrinsic encoder achieves remarkable performance for place recognition under drastic appearance changes. The proposed intrinsic encoder outperforms the state-of-the-art image-level place recognition methods on standard benchmark Nordland.
CVApr 3, 2018
Unsupervised Semantic-based Aggregation of Deep Convolutional FeaturesJian Xu, Chunheng Wang, Chengzuo Qi et al.
In this paper, we propose a simple but effective semantic-based aggregation (SBA) method. The proposed SBA utilizes the discriminative filters of deep convolutional layers as semantic detectors. Moreover, we propose the effective unsupervised strategy to select some semantic detectors to generate the "probabilistic proposals", which highlight certain discriminative pattern of objects and suppress the noise of background. The final global SBA representation could then be acquired by aggregating the regional representations weighted by the selected "probabilistic proposals" corresponding to various semantic content. Our unsupervised SBA is easy to generalize and achieves excellent performance on various tasks. We conduct comprehensive experiments and show that our unsupervised SBA outperforms the state-of-the-art unsupervised and supervised aggregation methods on image retrieval, place recognition and cloud classification.
CVJul 14, 2017
Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-scale Image RetrievalJian Xu, Chunheng Wang, Chengzuo Qi et al.
Existing manifold learning methods are not appropriate for image retrieval task, because most of them are unable to process query image and they have much additional computational cost especially for large scale database. Therefore, we propose the iterative manifold embedding (IME) layer, of which the weights are learned off-line by unsupervised strategy, to explore the intrinsic manifolds by incomplete data. On the large scale database that contains 27000 images, IME layer is more than 120 times faster than other manifold learning methods to embed the original representations at query time. We embed the original descriptors of database images which lie on manifold in a high dimensional space into manifold-based representations iteratively to generate the IME representations in off-line learning stage. According to the original descriptors and the IME representations of database images, we estimate the weights of IME layer by ridge regression. In on-line retrieval stage, we employ the IME layer to map the original representation of query image with ignorable time cost (2 milliseconds). We experiment on five public standard datasets for image retrieval. The proposed IME layer significantly outperforms related dimension reduction methods and manifold learning methods. Without post-processing, Our IME layer achieves a boost in performance of state-of-the-art image retrieval methods with post-processing on most datasets, and needs less computational cost.