Jun Shu

LG
h-index10
22papers
623citations
Novelty56%
AI Score59

22 Papers

CVAug 13, 2023Code
Dual Meta-Learning with Longitudinally Generalized Regularization for One-Shot Brain Tissue Segmentation Across the Human Lifespan

Yongheng Sun, Fan Wang, Jun Shu et al.

Brain tissue segmentation is essential for neuroscience and clinical studies. However, segmentation on longitudinal data is challenging due to dynamic brain changes across the lifespan. Previous researches mainly focus on self-supervision with regularizations and will lose longitudinal generalization when fine-tuning on a specific age group. In this paper, we propose a dual meta-learning paradigm to learn longitudinally consistent representations and persist when fine-tuning. Specifically, we learn a plug-and-play feature extractor to extract longitudinal-consistent anatomical representations by meta-feature learning and a well-initialized task head for fine-tuning by meta-initialization learning. Besides, two class-aware regularizations are proposed to encourage longitudinal consistency. Experimental results on the iSeg2019 and ADNI datasets demonstrate the effectiveness of our method. Our code is available at https://github.com/ladderlab-xjtu/DuMeta.

LGJan 18, 2023
Improve Noise Tolerance of Robust Loss via Noise-Awareness

Kehui Ding, Jun Shu, Deyu Meng et al.

Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current approaches for designing robust losses involve the introduction of noise-robust factors, i.e., hyperparameters, to control the trade-off between noise robustness and learnability. However, finding suitable hyperparameters for different datasets with noisy labels is a challenging and time-consuming task. Moreover, existing robust loss methods usually assume that all training samples share common hyperparameters, which are independent of instances. This limits the ability of these methods to distinguish the individual noise properties of different samples and overlooks the varying contributions of diverse training samples in helping models understand underlying patterns. To address above issues, we propose to assemble robust loss with instance-dependent hyperparameters to improve their noise tolerance with theoretical guarantee. To achieve setting such instance-dependent hyperparameters for robust loss, we propose a meta-learning method which is capable of adaptively learning a hyperparameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster for brevity). Through mutual amelioration between hyperparameter prediction function and classifier parameters in our method, both of them can be simultaneously finely ameliorated and coordinated to attain solutions with good generalization capability. Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance.

CVFeb 6Code
DuMeta++: Spatiotemporal Dual Meta-Learning for Generalizable Few-Shot Brain Tissue Segmentation Across Diverse Ages

Yongheng Sun, Jun Shu, Jianhua Ma et al.

Accurate segmentation of brain tissues from MRI scans is critical for neuroscience and clinical applications, but achieving consistent performance across the human lifespan remains challenging due to dynamic, age-related changes in brain appearance and morphology. While prior work has sought to mitigate these shifts by using self-supervised regularization with paired longitudinal data, such data are often unavailable in practice. To address this, we propose \emph{DuMeta++}, a dual meta-learning framework that operates without paired longitudinal data. Our approach integrates: (1) meta-feature learning to extract age-agnostic semantic representations of spatiotemporally evolving brain structures, and (2) meta-initialization learning to enable data-efficient adaptation of the segmentation model. Furthermore, we propose a memory-bank-based class-aware regularization strategy to enforce longitudinal consistency without explicit longitudinal supervision. We theoretically prove the convergence of our DuMeta++, ensuring stability. Experiments on diverse datasets (iSeg-2019, IBIS, OASIS, ADNI) under few-shot settings demonstrate that DuMeta++ outperforms existing methods in cross-age generalization. Code will be available at https://github.com/ladderlab-xjtu/DuMeta++.

CVMar 5, 2024Code
Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud?

Chenqiang Gao, Chuandong Liu, Jun Shu et al.

Current state-of-the-art (SOTA) 3D object detection methods often require a large amount of 3D bounding box annotations for training. However, collecting such large-scale densely-supervised datasets is notoriously costly. To reduce the cumbersome data annotation process, we propose a novel sparsely-annotated framework, in which we just annotate one 3D object per scene. Such a sparse annotation strategy could significantly reduce the heavy annotation burden, while inexact and incomplete sparse supervision may severely deteriorate the detection performance. To address this issue, we develop the SS3D++ method that alternatively improves 3D detector training and confident fully-annotated scene generation in a unified learning scheme. Using sparse annotations as seeds, we progressively generate confident fully-annotated scenes based on designing a missing-annotated instance mining module and reliable background mining module. Our proposed method produces competitive results when compared with SOTA weakly-supervised methods using the same or even more annotation costs. Besides, compared with SOTA fully-supervised methods, we achieve on-par or even better performance on the KITTI dataset with about 5x less annotation cost, and 90% of their performance on the Waymo dataset with about 15x less annotation cost. The additional unlabeled training scenes could further boost the performance. The code will be available at https://github.com/gaocq/SS3D2.

LGJun 9, 2025Code
Improving Memory Efficiency for Training KANs via Meta Learning

Zhangchi Zhao, Jun Shu, Deyu Meng et al.

Inspired by the Kolmogorov-Arnold representation theorem, KANs offer a novel framework for function approximation by replacing traditional neural network weights with learnable univariate functions. This design demonstrates significant potential as an efficient and interpretable alternative to traditional MLPs. However, KANs are characterized by a substantially larger number of trainable parameters, leading to challenges in memory efficiency and higher training costs compared to MLPs. To address this limitation, we propose to generate weights for KANs via a smaller meta-learner, called MetaKANs. By training KANs and MetaKANs in an end-to-end differentiable manner, MetaKANs achieve comparable or even superior performance while significantly reducing the number of trainable parameters and maintaining promising interpretability. Extensive experiments on diverse benchmark tasks, including symbolic regression, partial differential equation solving, and image classification, demonstrate the effectiveness of MetaKANs in improving parameter efficiency and memory usage. The proposed method provides an alternative technique for training KANs, that allows for greater scalability and extensibility, and narrows the training cost gap with MLPs stated in the original paper of KANs. Our code is available at https://github.com/Murphyzc/MetaKAN.

87.1LGApr 27
A Limit Theory of Foundation Models: A Mathematical Approach to Understanding Emergent Intelligence and Scaling Laws

Jun Shu, Junxiong Jia, Deyu Meng et al.

Emergent intelligence have played a major role in the modern AI development. While existing studies primarily rely on empirical observations to characterize this phenomenon, a rigorous theoretical framework remains underexplored. This study attempts to develop a mathematical approach to formalize emergent intelligence from the perspective of limit theory. Specifically, we introduce a performance function E(N, P, K), dependent on data size N, model size P and training steps K, to quantify intelligence behavior. We posit that intelligence emerges as a transition from finite to effectively infinite knowledge, and thus recast emergent intelligence as existence of the limit $\lim_{N,P,K \to \infty} \mathcal{E}(N,P,K)$, with emergent abilities corresponding to the limiting behavior. This limit theory helps reveal that emergent intelligence originates from the existence of a parameter-limit architecture (referred to as the limit architecture), and that emergent intelligence rationally corresponds to the learning behavior of this limit system. By introducing tools from nonlinear Lipschitz operator theory, we prove that the necessary and sufficient conditions for existence of the limit architecture. Furthermore, we derive the scaling law of foundation models by leveraging tools of Lipschitz operator and covering number. Theoretical results show that: 1) emergent intelligence is governed by three key factors-training steps, data size and the model architecture, where the properties of basic blocks play a crucial role in constructing foundation models; 2) the critical condition Lip(T)=1 for emergent intelligence provides theoretical support for existing findings. 3) emergent intelligence is determined by an infinite-dimensional system, yet can be effectively realized in practice through a finite-dimensional architecture. Our empirical results corroborate these theoretical findings.

LGFeb 20
Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition

Yubo Zhou, Jun Shu, Junmin Liu et al.

Gradient-based hyperparameter optimization (HPO) have emerged recently, leveraging bilevel programming techniques to optimize hyperparameter by estimating hypergradient w.r.t. validation loss. Nevertheless, previous theoretical works mainly focus on reducing the gap between the estimation and ground-truth (i.e., the bias), while ignoring the error due to data distribution (i.e., the variance), which degrades performance. To address this issue, we conduct a bias-variance decomposition for hypergradient estimation error and provide a supplemental detailed analysis of the variance term ignored by previous works. We also present a comprehensive analysis of the error bounds for hypergradient estimation. This facilitates an easy explanation of some phenomena commonly observed in practice, like overfitting to the validation set. Inspired by the derived theories, we propose an ensemble hypergradient strategy to reduce the variance in HPO algorithms effectively. Experimental results on tasks including regularization hyperparameter learning, data hyper-cleaning, and few-shot learning demonstrate that our variance reduction strategy improves hypergradient estimation. To explain the improved performance, we establish a connection between excess error and hypergradient estimation, offering some understanding of empirical observations.

LGFeb 15
KoopGen: Koopman Generator Networks for Representing and Predicting Dynamical Systems with Continuous Spectra

Liangyu Su, Jun Shu, Rui Liu et al.

Representing and predicting high-dimensional and spatiotemporally chaotic dynamical systems remains a fundamental challenge in dynamical systems and machine learning. Although data-driven models can achieve accurate short-term forecasts, they often lack stability, interpretability, and scalability in regimes dominated by broadband or continuous spectra. Koopman-based approaches provide a principled linear perspective on nonlinear dynamics, but existing methods rely on restrictive finite-dimensional assumptions or explicit spectral parameterizations that degrade in high-dimensional settings. Against these issues, we introduce KoopGen, a generator-based neural Koopman framework that models dynamics through a structured, state-dependent representation of Koopman generators. By exploiting the intrinsic Cartesian decomposition into skew-adjoint and self-adjoint components, KoopGen separates conservative transport from irreversible dissipation while enforcing exact operator-theoretic constraints during learning. Across systems ranging from nonlinear oscillators to high-dimensional chaotic and spatiotemporal dynamics, KoopGen improves prediction accuracy and stability, while clarifying which components of continuous-spectrum dynamics admit interpretable and learnable representations.

LGAug 31, 2025
Feed Two Birds with One Scone: Exploiting Function-Space Regularization for Both OOD Robustness and ID Fine-Tuning Performance

Xiang Yuan, Jun Shu, Deyu meng et al.

Robust fine-tuning aims to achieve competitive in-distribution (ID) performance while maintaining the out-of-distribution (OOD) robustness of a pre-trained model when transferring it to a downstream task. To remedy this, most robust fine-tuning methods aim to preserve the pretrained weights, features, or logits. However, we find that these methods cannot always improve OOD robustness for different model architectures. This is due to the OOD robustness requiring the model function to produce stable prediction for input information of downstream tasks, while existing methods might serve as a poor proxy for the optimization in the function space. Based on this finding, we propose a novel regularization that constrains the distance of fine-tuning and pre-trained model in the function space with the simulated OOD samples, aiming to preserve the OOD robustness of the pre-trained model. Besides, to further enhance the OOD robustness capability of the fine-tuning model, we introduce an additional consistency regularization to promote stable predictions of perturbed samples. Extensive experiments demonstrate our approach could consistently improve both downstream task ID fine-tuning performance and OOD robustness across a variety of CLIP backbones, outperforming existing regularization-based robust fine-tuning methods.

CVMar 8, 2025
Removing Multiple Hybrid Adverse Weather in Video via a Unified Model

Yecong Wan, Mingwen Shao, Yuanshuo Cheng et al.

Videos captured under real-world adverse weather conditions typically suffer from uncertain hybrid weather artifacts with heterogeneous degradation distributions. However, existing algorithms only excel at specific single degradation distributions due to limited adaption capacity and have to deal with different weather degradations with separately trained models, thus may fail to handle real-world stochastic weather scenarios. Besides, the model training is also infeasible due to the lack of paired video data to characterize the coexistence of multiple weather. To ameliorate the aforementioned issue, we propose a novel unified model, dubbed UniWRV, to remove multiple heterogeneous video weather degradations in an all-in-one fashion. Specifically, to tackle degenerate spatial feature heterogeneity, we propose a tailored weather prior guided module that queries exclusive priors for different instances as prompts to steer spatial feature characterization. To tackle degenerate temporal feature heterogeneity, we propose a dynamic routing aggregation module that can automatically select optimal fusion paths for different instances to dynamically integrate temporal features. Additionally, we managed to construct a new synthetic video dataset, termed HWVideo, for learning and benchmarking multiple hybrid adverse weather removal, which contains 15 hybrid weather conditions with a total of 1500 adverse-weather/clean paired video clips. Real-world hybrid weather videos are also collected for evaluating model generalizability. Comprehensive experiments demonstrate that our UniWRV exhibits robust and superior adaptation capability in multiple heterogeneous degradations learning scenarios, including various generic video restoration tasks beyond weather removal.

LGMay 13, 2023
DAC-MR: Data Augmentation Consistency Based Meta-Regularization for Meta-Learning

Jun Shu, Xiang Yuan, Deyu Meng et al.

Meta learning recently has been heavily researched and helped advance the contemporary machine learning. However, achieving well-performing meta-learning model requires a large amount of training tasks with high-quality meta-data representing the underlying task generalization goal, which is sometimes difficult and expensive to obtain for real applications. Current meta-data-driven meta-learning approaches, however, are fairly hard to train satisfactory meta-models with imperfect training tasks. To address this issue, we suggest a meta-knowledge informed meta-learning (MKIML) framework to improve meta-learning by additionally integrating compensated meta-knowledge into meta-learning process. We preliminarily integrate meta-knowledge into meta-objective via using an appropriate meta-regularization (MR) objective to regularize capacity complexity of the meta-model function class to facilitate better generalization on unseen tasks. As a practical implementation, we introduce data augmentation consistency to encode invariance as meta-knowledge for instantiating MR objective, denoted by DAC-MR. The proposed DAC-MR is hopeful to learn well-performing meta-models from training tasks with noisy, sparse or unavailable meta-data. We theoretically demonstrate that DAC-MR can be treated as a proxy meta-objective used to evaluate meta-model without high-quality meta-data. Besides, meta-data-driven meta-loss objective combined with DAC-MR is capable of achieving better meta-level generalization. 10 meta-learning tasks with different network architectures and benchmarks substantiate the capability of our DAC-MR on aiding meta-model learning. Fine performance of DAC-MR are obtained across all settings, and are well-aligned with our theoretical insights. This implies that our DAC-MR is problem-agnostic, and hopeful to be readily applied to extensive meta-learning problems and tasks.

LGFeb 16, 2022
Diagnosing Batch Normalization in Class Incremental Learning

Minghao Zhou, Quanziang Wang, Jun Shu et al.

Extensive researches have applied deep neural networks (DNNs) in class incremental learning (Class-IL). As building blocks of DNNs, batch normalization (BN) standardizes intermediate feature maps and has been widely validated to improve training stability and convergence. However, we claim that the direct use of standard BN in Class-IL models is harmful to both the representation learning and the classifier training, thus exacerbating catastrophic forgetting. In this paper we investigate the influence of BN on Class-IL models by illustrating such BN dilemma. We further propose BN Tricks to address the issue by training a better feature extractor while eliminating classification bias. Without inviting extra hyperparameters, we apply BN Tricks to three baseline rehearsal-based methods, ER, DER++ and iCaRL. Through comprehensive experiments conducted on benchmark datasets of Seq-CIFAR-10, Seq-CIFAR-100 and Seq-Tiny-ImageNet, we show that BN Tricks can bring significant performance gains to all adopted baselines, revealing its potential generality along this line of research.

LGFeb 11, 2022
CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning

Jun Shu, Xiang Yuan, Deyu Meng et al.

Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance. Sample re-weighting methods are popularly used to alleviate this data bias issue. Most current methods, however, require to manually pre-specify the weighting schemes as well as their additional hyper-parameters relying on the characteristics of the investigated problem and training data. This makes them fairly hard to be generally applied in practical scenarios, due to their significant complexities and inter-class variations of data bias situations. To address this issue, we propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data. Specifically, by seeing each training class as a separate learning task, our method aims to extract an explicit weighting function with sample loss and task/class feature as input, and sample weight as output, expecting to impose adaptively varying weighting schemes to different sample classes based on their own intrinsic bias characteristics. Synthetic and real data experiments substantiate the capability of our method on achieving proper weighting schemes in various data bias cases, like the class imbalance, feature-independent and dependent label noise scenarios, and more complicated bias scenarios beyond conventional cases. Besides, the task-transferability of the learned weighting scheme is also substantiated, by readily deploying the weighting function learned on relatively smaller-scale CIFAR-10 dataset on much larger-scale full WebVision dataset. A performance gain can be readily achieved compared with previous SOAT ones without additional hyper-parameter tuning and meta gradient descent step. The general availability of our method for multiple robust deep learning issues, including partial-label learning, semi-supervised learning and selective classification, has also been validated.

LGJul 6, 2021
Learning an Explicit Hyperparameter Prediction Function Conditioned on Tasks

Jun Shu, Deyu Meng, Zongben Xu

Meta learning has attracted much attention recently in machine learning community. Contrary to conventional machine learning aiming to learn inherent prediction rules to predict labels for new query data, meta learning aims to learn the learning methodology for machine learning from observed tasks, so as to generalize to new query tasks by leveraging the meta-learned learning methodology. In this study, we interpret such learning methodology as learning an explicit hyper-parameter prediction function shared by all training tasks. Specifically, this function is represented as a parameterized function called meta-learner, mapping from a training/test task to its suitable hyper-parameter setting, extracted from a pre-specified function set called meta learning machine. Such setting guarantees that the meta-learned learning methodology is able to flexibly fit diverse query tasks, instead of only obtaining fixed hyper-parameters by many current meta learning methods, with less adaptability to query task's variations. Such understanding of meta learning also makes it easily succeed from traditional learning theory for analyzing its generalization bounds with general losses/tasks/models. The theory naturally leads to some feasible controlling strategies for ameliorating the quality of the extracted meta-learner, verified to be able to finely ameliorate its generalization capability in some typical meta learning applications, including few-shot regression, few-shot classification and domain generalization.

LGSep 2, 2020
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction

Ziyi Yang, Jun Shu, Yong Liang et al.

Current machine learning has made great progress on computer vision and many other fields attributed to the large amount of high-quality training samples, while it does not work very well on genomic data analysis, since they are notoriously known as small data. In our work, we focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients that can guide treatment decisions for a specific individual through training on small data. In fact, doctors and clinicians always address this problem by studying several interrelated clinical variables simultaneously. We attempt to simulate such clinical perspective, and introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks and transfer it to help address new tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification. Observing that gene expression data have specifically high dimensionality and high noise properties compared with image data, we proposed a new extension of it by appending two modules to address these issues. Concretely, we append a feature selection layer to automatically filter out the disease-irrelated genes and incorporate a sample reweighting strategy to adaptively remove noisy data, and meanwhile the extended model is capable of learning from a limited number of training examples and generalize well. Simulations and real gene expression data experiments substantiate the superiority of the proposed method for predicting the subtypes of disease and identifying potential disease-related genes.

CVAug 8, 2020
Meta Feature Modulator for Long-tailed Recognition

Renzhen Wang, Kaiqin Hu, Yanwen Zhu et al.

Deep neural networks often degrade significantly when training data suffer from class imbalance problems. Existing approaches, e.g., re-sampling and re-weighting, commonly address this issue by rearranging the label distribution of training data to train the networks fitting well to the implicit balanced label distribution. However, most of them hinder the representative ability of learned features due to insufficient use of intra/inter-sample information of training data. To address this issue, we propose meta feature modulator (MFM), a meta-learning framework to model the difference between the long-tailed training data and the balanced meta data from the perspective of representation learning. Concretely, we employ learnable hyper-parameters (dubbed modulation parameters) to adaptively scale and shift the intermediate features of classification networks, and the modulation parameters are optimized together with the classification network parameters guided by a small amount of balanced meta data. We further design a modulator network to guide the generation of the modulation parameters, and such a meta-learner can be readily adapted to train the classification network on other long-tailed datasets. Extensive experiments on benchmark vision datasets substantiate the superiority of our approach on long-tailed recognition tasks beyond other state-of-the-art methods.

CVAug 3, 2020
Learning to Purify Noisy Labels via Meta Soft Label Corrector

Yichen Wu, Jun Shu, Qi Xie et al.

Recent deep neural networks (DNNs) can easily overfit to biased training data with noisy labels. Label correction strategy is commonly used to alleviate this issue by designing a method to identity suspected noisy labels and then correct them. Current approaches to correcting corrupted labels usually need certain pre-defined label correction rules or manually preset hyper-parameters. These fixed settings make it hard to apply in practice since the accurate label correction usually related with the concrete problem, training data and the temporal information hidden in dynamic iterations of training process. To address this issue, we propose a meta-learning model which could estimate soft labels through meta-gradient descent step under the guidance of noise-free meta data. By viewing the label correction procedure as a meta-process and using a meta-learner to automatically correct labels, we could adaptively obtain rectified soft labels iteratively according to current training problems without manually preset hyper-parameters. Besides, our method is model-agnostic and we can combine it with any other existing model with ease. Comprehensive experiments substantiate the superiority of our method in both synthetic and real-world problems with noisy labels compared with current SOTA label correction strategies.

LGJul 29, 2020
MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks

Jun Shu, Yanwen Zhu, Qian Zhao et al.

The learning rate (LR) is one of the most important hyper-parameters in stochastic gradient descent (SGD) algorithm for training deep neural networks (DNN). However, current hand-designed LR schedules need to manually pre-specify a fixed form, which limits their ability to adapt practical non-convex optimization problems due to the significant diversification of training dynamics. Meanwhile, it always needs to search proper LR schedules from scratch for new tasks, which, however, are often largely different with task variations, like data modalities, network architectures, or training data capacities. To address this learning-rate-schedule setting issues, we propose to parameterize LR schedules with an explicit mapping formulation, called \textit{MLR-SNet}. The learnable parameterized structure brings more flexibility for MLR-SNet to learn a proper LR schedule to comply with the training dynamics of DNN. Image and text classification benchmark experiments substantiate the capability of our method for achieving proper LR schedules. Moreover, the explicit parameterized structure makes the meta-learned LR schedules capable of being transferable and plug-and-play, which can be easily generalized to new heterogeneous tasks. We transfer our meta-learned MLR-SNet to query tasks like different training epochs, network architectures, data modalities, dataset sizes from the training ones, and achieve comparable or even better performance compared with hand-designed LR schedules specifically designed for the query tasks. The robustness of MLR-SNet is also substantiated when the training data are biased with corrupted noise. We further prove the convergence of the SGD algorithm equipped with LR schedule produced by our MLR-Net, with the convergence rate comparable to the best-known ones of the algorithm for solving the problem.

LGJun 10, 2020
Meta Transition Adaptation for Robust Deep Learning with Noisy Labels

Jun Shu, Qian Zhao, Zongben Xu et al.

To discover intrinsic inter-class transition probabilities underlying data, learning with noise transition has become an important approach for robust deep learning on corrupted labels. Prior methods attempt to achieve such transition knowledge by pre-assuming strongly confident anchor points with 1-probability belonging to a specific class, generally infeasible in practice, or directly jointly estimating the transition matrix and learning the classifier from the noisy samples, always leading to inaccurate estimation misguided by wrong annotation information especially in large noise cases. To alleviate these issues, this study proposes a new meta-transition-learning strategy for the task. Specifically, through the sound guidance of a small set of meta data with clean labels, the noise transition matrix and the classifier parameters can be mutually ameliorated to avoid being trapped by noisy training samples, and without need of any anchor point assumptions. Besides, we prove our method is with statistical consistency guarantee on correctly estimating the desired transition matrix. Extensive synthetic and real experiments validate that our method can more accurately extract the transition matrix, naturally following its more robust performance than prior arts. Its essential relationship with label distribution learning is also discussed, which explains its fine performance even under no-noise scenarios.

LGFeb 16, 2020
Learning Adaptive Loss for Robust Learning with Noisy Labels

Jun Shu, Qian Zhao, Keyu Chen et al.

Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current robust loss functions, however, inevitably involve hyperparameter(s) to be tuned, manually or heuristically through cross validation, which makes them fairly hard to be generally applied in practice. Besides, the non-convexity brought by the loss as well as the complicated network architecture makes it easily trapped into an unexpected solution with poor generalization capability. To address above issues, we propose a meta-learning method capable of adaptively learning hyperparameter in robust loss functions. Specifically, through mutual amelioration between robust loss hyperparameter and network parameters in our method, both of them can be simultaneously finely learned and coordinated to attain solutions with good generalization capability. Four kinds of SOTA robust loss functions are attempted to be integrated into our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its accuracy and generalization performance, as compared with conventional hyperparameter tuning strategy, even with carefully tuned hyperparameters.

LGFeb 20, 2019
Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting

Jun Shu, Qi Xie, Lixuan Yi et al.

Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance. Sample re-weighting strategy is commonly used to alleviate this issue by designing a weighting function mapping from training loss to sample weight, and then iterating between weight recalculating and classifier updating. Current approaches, however, need manually pre-specify the weighting function as well as its additional hyper-parameters. It makes them fairly hard to be generally applied in practice due to the significant variation of proper weighting schemes relying on the investigated problem and training data. To address this issue, we propose a method capable of adaptively learning an explicit weighting function directly from data. The weighting function is an MLP with one hidden layer, constituting a universal approximator to almost any continuous functions, making the method able to fit a wide range of weighting functions including those assumed in conventional research. Guided by a small amount of unbiased meta-data, the parameters of the weighting function can be finely updated simultaneously with the learning process of the classifiers. Synthetic and real experiments substantiate the capability of our method for achieving proper weighting functions in class imbalance and noisy label cases, fully complying with the common settings in traditional methods, and more complicated scenarios beyond conventional cases. This naturally leads to its better accuracy than other state-of-the-art methods.

LGAug 14, 2018
Small Sample Learning in Big Data Era

Jun Shu, Zongben Xu, Deyu Meng

As a promising area in artificial intelligence, a new learning paradigm, called Small Sample Learning (SSL), has been attracting prominent research attention in the recent years. In this paper, we aim to present a survey to comprehensively introduce the current techniques proposed on this topic. Specifically, current SSL techniques can be mainly divided into two categories. The first category of SSL approaches can be called "concept learning", which emphasizes learning new concepts from only few related observations. The purpose is mainly to simulate human learning behaviors like recognition, generation, imagination, synthesis and analysis. The second category is called "experience learning", which usually co-exists with the large sample learning manner of conventional machine learning. This category mainly focuses on learning with insufficient samples, and can also be called small data learning in some literatures. More extensive surveys on both categories of SSL techniques are introduced and some neuroscience evidences are provided to clarify the rationality of the entire SSL regime, and the relationship with human learning process. Some discussions on the main challenges and possible future research directions along this line are also presented.