Xinshao Wang

LG
11papers
516citations
Novelty54%
AI Score30

11 Papers

CVNov 16, 2022Code
AdaTriplet-RA: Domain Matching via Adaptive Triplet and Reinforced Attention for Unsupervised Domain Adaptation

Xinyao Shu, Shiyang Yan, Zhenyu Lu et al.

Unsupervised domain adaption (UDA) is a transfer learning task where the data and annotations of the source domain are available but only have access to the unlabeled target data during training. Most previous methods try to minimise the domain gap by performing distribution alignment between the source and target domains, which has a notable limitation, i.e., operating at the domain level, but neglecting the sample-level differences. To mitigate this weakness, we propose to improve the unsupervised domain adaptation task with an inter-domain sample matching scheme. We apply the widely-used and robust Triplet loss to match the inter-domain samples. To reduce the catastrophic effect of the inaccurate pseudo-labels generated during training, we propose a novel uncertainty measurement method to select reliable pseudo-labels automatically and progressively refine them. We apply the advanced discrete relaxation Gumbel Softmax technique to realise an adaptive Topk scheme to fulfil the functionality. In addition, to enable the global ranking optimisation within one batch for the domain matching, the whole model is optimised via a novel reinforced attention mechanism with supervision from the policy gradient algorithm, using the Average Precision (AP) as the reward. Our model (termed \textbf{\textit{AdaTriplet-RA}}) achieves State-of-the-art results on several public benchmark datasets, and its effectiveness is validated via comprehensive ablation studies. Our method improves the accuracy of the baseline by 9.7\% (ResNet-101) and 6.2\% (ResNet-50) on the VisDa dataset and 4.22\% (ResNet-50) on the Domainnet dataset. {The source code is publicly available at \textit{https://github.com/shuxy0120/AdaTriplet-RA}}.

LGJun 30, 2022
ProSelfLC: Progressive Self Label Correction Towards A Low-Temperature Entropy State

Xinshao Wang, Yang Hua, Elyor Kodirov et al.

There is a family of label modification approaches including self and non-self label correction (LC), and output regularisation. They are widely used for training robust deep neural networks (DNNs), but have not been mathematically and thoroughly analysed together. We study them and discover three key issues: (1) We are more interested in adopting Self LC as it leverages its own knowledge and requires no auxiliary models. However, it is unclear how to adaptively trust a learner as the training proceeds. (2) Some methods penalise while the others reward low-entropy (i.e., high-confidence) predictions, prompting us to ask which one is better. (3) Using the standard training setting, a learned model becomes less confident when severe noise exists. Self LC using high-entropy knowledge would generate high-entropy targets. To resolve the issue (1), inspired by a well-accepted finding, i.e., deep neural networks learn meaningful patterns before fitting noise, we propose a novel end-to-end method named ProSelfLC, which is designed according to the learning time and prediction entropy. Concretely, for any data point, we progressively and adaptively trust its predicted probability distribution versus its annotated one if a network has been trained for a relatively long time and the prediction is of low entropy. For the issue (2), the effectiveness of ProSelfLC defends entropy minimisation. By ProSelfLC, we empirically prove that it is more effective to redefine a semantic low-entropy state and optimise the learner toward it. To address the issue (3), we decrease the entropy of self knowledge using a low temperature before exploiting it to correct labels, so that the revised labels redefine low-entropy target probability distributions. We demonstrate the effectiveness of ProSelfLC through extensive experiments in both clean and noisy settings, and on both image and protein datasets.

MLOct 11, 2022
Misspecified Phase Retrieval with Generative Priors

Zhaoqiang Liu, Xinshao Wang, Jiulong Liu · oxford

In this paper, we study phase retrieval under model misspecification and generative priors. In particular, we aim to estimate an $n$-dimensional signal $\mathbf{x}$ from $m$ i.i.d.~realizations of the single index model $y = f(\mathbf{a}^T\mathbf{x})$, where $f$ is an unknown and possibly random nonlinear link function and $\mathbf{a} \in \mathbb{R}^n$ is a standard Gaussian vector. We make the assumption $\mathrm{Cov}[y,(\mathbf{a}^T\mathbf{x})^2] \ne 0$, which corresponds to the misspecified phase retrieval problem. In addition, the underlying signal $\mathbf{x}$ is assumed to lie in the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a two-step approach, for which the first step plays the role of spectral initialization and the second step refines the estimated vector produced by the first step iteratively. We show that both steps enjoy a statistical rate of order $\sqrt{(k\log L)\cdot (\log m)/m}$ under suitable conditions. Experiments on image datasets are performed to demonstrate that our approach performs on par with or even significantly outperforms several competing methods.

LGMay 7, 2020Code
ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks

Xinshao Wang, Yang Hua, Elyor Kodirov et al.

To train robust deep neural networks (DNNs), we systematically study several target modification approaches, which include output regularisation, self and non-self label correction (LC). Two key issues are discovered: (1) Self LC is the most appealing as it exploits its own knowledge and requires no extra models. However, how to automatically decide the trust degree of a learner as training goes is not well answered in the literature? (2) Some methods penalise while the others reward low-entropy predictions, prompting us to ask which one is better? To resolve the first issue, taking two well-accepted propositions--deep neural networks learn meaningful patterns before fitting noise [3] and minimum entropy regularisation principle [10]--we propose a novel end-to-end method named ProSelfLC, which is designed according to learning time and entropy. Specifically, given a data point, we progressively increase trust in its predicted label distribution versus its annotated one if a model has been trained for enough time and the prediction is of low entropy (high confidence). For the second issue, according to ProSelfLC, we empirically prove that it is better to redefine a meaningful low-entropy status and optimise the learner toward it. This serves as a defence of entropy minimisation. We demonstrate the effectiveness of ProSelfLC through extensive experiments in both clean and noisy settings. The source code is available at https://github.com/XinshaoAmosWang/ProSelfLC-CVPR2021. Keywords: entropy minimisation, maximum entropy, confidence penalty, self knowledge distillation, label correction, label noise, semi-supervised learning, output regularisation

LGJun 2, 2021
Not All Knowledge Is Created Equal: Mutual Distillation of Confident Knowledge

Ziyun Li, Xinshao Wang, Di Hu et al.

Mutual knowledge distillation (MKD) improves a model by distilling knowledge from another model. However, \textit{not all knowledge is certain and correct}, especially under adverse conditions. For example, label noise usually leads to less reliable models due to undesired memorization \cite{zhang2017understanding,arpit2017closer}. Wrong knowledge misleads the learning rather than helps. This problem can be handled by two aspects: (i) improving the reliability of a model where the knowledge is from (i.e., knowledge source's reliability); (ii) selecting reliable knowledge for distillation. In the literature, making a model more reliable is widely studied while selective MKD receives little attention. Therefore, we focus on studying selective MKD. Concretely, a generic MKD framework, \underline{C}onfident knowledge selection followed by \underline{M}utual \underline{D}istillation (CMD), is designed. The key component of CMD is a generic knowledge selection formulation, making the selection threshold either static (CMD-S) or progressive (CMD-P). Additionally, CMD covers two special cases: zero-knowledge and all knowledge, leading to a unified MKD framework. Extensive experiments are present to demonstrate the effectiveness of CMD and thoroughly justify the design of CMD. For example, CMD-P obtains new state-of-the-art results in robustness against label noise.

LGNov 22, 2019
Instance Cross Entropy for Deep Metric Learning

Xinshao Wang, Elyor Kodirov, Yang Hua et al.

Loss functions play a crucial role in deep metric learning thus a variety of them have been proposed. Some supervise the learning process by pairwise or tripletwise similarity constraints while others take advantage of structured similarity information among multiple data points. In this work, we approach deep metric learning from a novel perspective. We propose instance cross entropy (ICE) which measures the difference between an estimated instance-level matching distribution and its ground-truth one. ICE has three main appealing properties. Firstly, similar to categorical cross entropy (CCE), ICE has clear probabilistic interpretation and exploits structured semantic similarity information for learning supervision. Secondly, ICE is scalable to infinite training data as it learns on mini-batches iteratively and is independent of the training set size. Thirdly, motivated by our relative weight analysis, seamless sample reweighting is incorporated. It rescales samples' gradients to control the differentiation degree over training examples instead of truncating them by sample mining. In addition to its simplicity and intuitiveness, extensive experiments on three real-world benchmarks demonstrate the superiority of ICE.

CVNov 20, 2019
ID-aware Quality for Set-based Person Re-identification

Xinshao Wang, Elyor Kodirov, Yang Hua et al.

Set-based person re-identification (SReID) is a matching problem that aims to verify whether two sets are of the same identity (ID). Existing SReID models typically generate a feature representation per image and aggregate them to represent the set as a single embedding. However, they can easily be perturbed by noises--perceptually/semantically low quality images--which are inevitable due to imperfect tracking/detection systems, or overfit to trivial images. In this work, we present a novel and simple solution to this problem based on ID-aware quality that measures the perceptual and semantic quality of images guided by their ID information. Specifically, we propose an ID-aware Embedding that consists of two key components: (1) Feature learning attention that aims to learn robust image embeddings by focusing on 'medium' hard images. This way it can prevent overfitting to trivial images, and alleviate the influence of outliers. (2) Feature fusion attention is to fuse image embeddings in the set to obtain the set-level embedding. It ignores noisy information and pays more attention to discriminative images to aggregate more discriminative information. Experimental results on four datasets show that our method outperforms state-of-the-art approaches despite the simplicity of our approach.

LGMay 27, 2019
Derivative Manipulation for General Example Weighting

Xinshao Wang, Elyor Kodirov, Yang Hua et al.

Real-world large-scale datasets usually contain noisy labels and are imbalanced. Therefore, we propose derivative manipulation (DM), a novel and general example weighting approach for training robust deep models under these adverse conditions. DM has two main merits. First, loss function and example weighting are common techniques in the literature. DM reveals their connection (a loss function does example weighting) and is a replacement of both. Second, despite that a loss defines an example weighting scheme by its derivative, in the loss design, we need to consider whether it is differentiable. Instead, DM is more flexible by directly modifying the derivative so that a loss can be a non-elementary format too. Technically, DM defines an emphasis density function by a derivative magnitude function. DM is generic in that diverse weighting schemes can be derived. Extensive experiments on both vision and language tasks prove DM's effectiveness.

LGMar 28, 2019
IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters

Xinshao Wang, Yang Hua, Elyor Kodirov et al.

In this work, we study robust deep learning against abnormal training data from the perspective of example weighting built in empirical loss functions, i.e., gradient magnitude with respect to logits, an angle that is not thoroughly studied so far. Consequently, we have two key findings: (1) Mean Absolute Error (MAE) Does Not Treat Examples Equally. We present new observations and insightful analysis about MAE, which is theoretically proved to be noise-robust. First, we reveal its underfitting problem in practice. Second, we analyse that MAE's noise-robustness is from emphasising on uncertain examples instead of treating training samples equally, as claimed in prior work. (2) The Variance of Gradient Magnitude Matters. We propose an effective and simple solution to enhance MAE's fitting ability while preserving its noise-robustness. Without changing MAE's overall weighting scheme, i.e., what examples get higher weights, we simply change its weighting variance non-linearly so that the impact ratio between two examples are adjusted. Our solution is termed Improved MAE (IMAE). We prove IMAE's effectiveness using extensive experiments: image classification under clean labels, synthetic label noise, and real-world unknown noise.

CVMar 8, 2019
Ranked List Loss for Deep Metric Learning

Xinshao Wang, Yang Hua, Elyor Kodirov et al.

The objective of deep metric learning (DML) is to learn embeddings that can capture semantic similarity and dissimilarity information among data points. Existing pairwise or tripletwise loss functions used in DML are known to suffer from slow convergence due to a large proportion of trivial pairs or triplets as the model improves. To improve this, ranking-motivated structured losses are proposed recently to incorporate multiple examples and exploit the structured information among them. They converge faster and achieve state-of-the-art performance. In this work, we unveil two limitations of existing ranking-motivated structured losses and propose a novel ranked list loss to solve both of them. First, given a query, only a fraction of data points is incorporated to build the similarity structure. Consequently, some useful examples are ignored and the structure is less informative. To address this, we propose to build a set-based similarity structure by exploiting all instances in the gallery. The learning setting can be interpreted as few-shot retrieval: given a mini-batch, every example is iteratively used as a query, and the rest ones compose the gallery to search, i.e., the support set in few-shot setting. The rest examples are split into a positive set and a negative set. For every mini-batch, the learning objective of ranked list loss is to make the query closer to the positive set than to the negative set by a margin. Second, previous methods aim to pull positive pairs as close as possible in the embedding space. As a result, the intraclass data distribution tends to be extremely compressed. In contrast, we propose to learn a hypersphere for each class in order to preserve useful similarity structure inside it, which functions as regularisation. Extensive experiments demonstrate the superiority of our proposal by comparing with the state-of-the-art methods.

LGNov 4, 2018
Deep Metric Learning by Online Soft Mining and Class-Aware Attention

Xinshao Wang, Yang Hua, Elyor Kodirov et al.

Deep metric learning aims to learn a deep embedding that can capture the semantic similarity of data points. Given the availability of massive training samples, deep metric learning is known to suffer from slow convergence due to a large fraction of trivial samples. Therefore, most existing methods generally resort to sample mining strategies for selecting nontrivial samples to accelerate convergence and improve performance. In this work, we identify two critical limitations of the sample mining methods, and provide solutions for both of them. First, previous mining methods assign one binary score to each sample, i.e., dropping or keeping it, so they only selects a subset of relevant samples in a mini-batch. Therefore, we propose a novel sample mining method, called Online Soft Mining (OSM), which assigns one continuous score to each sample to make use of all samples in the mini-batch. OSM learns extended manifolds that preserve useful intraclass variances by focusing on more similar positives. Second, the existing methods are easily influenced by outliers as they are generally included in the mined subset. To address this, we introduce Class-Aware Attention (CAA) that assigns little attention to abnormal data samples. Furthermore, by combining OSM and CAA, we propose a novel weighted contrastive loss to learn discriminative embeddings. Extensive experiments on two fine-grained visual categorisation datasets and two video-based person re-identification benchmarks show that our method significantly outperforms the state-of-the-art.