Lei Feng

h-index35

6papers

95citations

Novelty63%

AI Score37

Ranked #91,836 of 194,257 authors (top 47%)#20,309 in LG (top 51%)

6 Papers

13.7LGAug 12, 2023Code

Multi-Label Knowledge Distillation

Penghui Yang, Ming-Kun Xie, Chen-Chen Zong et al.

Existing knowledge distillation methods typically work by imparting the knowledge of output logits or intermediate feature maps from the teacher network to the student network, which is very successful in multi-class single-label learning. However, these methods can hardly be extended to the multi-label learning scenario, where each instance is associated with multiple semantic labels, because the prediction probabilities do not sum to one and feature maps of the whole example may ignore minor classes in such a scenario. In this paper, we propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems; on the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings. Experimental results on multiple benchmark datasets validate that the proposed method can avoid knowledge counteraction among labels, thus achieving superior performance against diverse comparing methods. Our code is available at: https://github.com/penghui-yang/L2D

13.7LGOct 9, 2023

Binary Classification with Confidence Difference

Wei Wang, Lei Feng, Yuchen Jiang et al.

Recently, learning with soft labels has been shown to achieve better performance than learning with hard labels in terms of model generalization, calibration, and robustness. However, collecting pointwise labeling confidence for all training examples can be challenging and time-consuming in real-world scenarios. This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification. Instead of pointwise labeling confidence, we are given only unlabeled data pairs with confidence difference that specifies the difference in the probabilities of being positive. We propose a risk-consistent approach to tackle this problem and show that the estimation error bound achieves the optimal convergence rate. We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven. Extensive experiments on benchmark data sets and a real-world recommender system data set validate the effectiveness of our proposed approaches in exploiting the supervision information of the confidence difference.

21.7LGNov 2, 2023

In Defense of Softmax Parametrization for Calibrated and Consistent Learning to Defer

Yuzhou Cao, Hussein Mozannar, Lei Feng et al.

Enabling machine learning classifiers to defer their decision to a downstream expert when the expert is more accurate will ensure improved safety and performance. This objective can be achieved with the learning-to-defer framework which aims to jointly learn how to classify and how to defer to the expert. In recent studies, it has been theoretically shown that popular estimators for learning to defer parameterized with softmax provide unbounded estimates for the likelihood of deferring which makes them uncalibrated. However, it remains unknown whether this is due to the widely used softmax parameterization and if we can find a softmax-based estimator that is both statistically consistent and possesses a valid probability estimator. In this work, we first show that the cause of the miscalibrated and unbounded estimator in prior literature is due to the symmetric nature of the surrogate losses used and not due to softmax. We then propose a novel statistically consistent asymmetric softmax-based surrogate loss that can produce valid estimates without the issue of unboundedness. We further analyze the non-asymptotic properties of our method and empirically validate its performance and calibration on benchmark datasets.

17.6LGFeb 6, 2024

Does confidence calibration improve conformal prediction?

Huajun Xi, Jianguo Huang, Kangdao Liu et al.

Conformal prediction is an emerging technique for uncertainty quantification that constructs prediction sets guaranteed to contain the true label with a predefined probability. Previous works often employ temperature scaling to calibrate classifiers, assuming that confidence calibration benefits conformal prediction. However, the specific impact of confidence calibration on conformal prediction remains underexplored. In this work, we make two key discoveries about the impact of confidence calibration methods on adaptive conformal prediction. Firstly, we empirically show that current confidence calibration methods (e.g., temperature scaling) typically lead to larger prediction sets in adaptive conformal prediction. Secondly, by investigating the role of temperature value, we observe that high-confidence predictions can enhance the efficiency of adaptive conformal prediction. Theoretically, we prove that predictions with higher confidence result in smaller prediction sets on expectation. This finding implies that the rescaling parameters in these calibration methods, when optimized with cross-entropy loss, might counteract the goal of generating efficient prediction sets. To address this issue, we propose Conformal Temperature Scaling (ConfTS), a variant of temperature scaling with a novel loss function designed to enhance the efficiency of prediction sets. This approach can be extended to optimize the parameters of other post-hoc methods of confidence calibration. Extensive experiments demonstrate that our method improves existing adaptive conformal prediction methods in classification tasks, especially with LLMs.

15.7LGFeb 2, 2024Code

A General Framework for Learning from Weak Supervision

Hao Chen, Jindong Wang, Lei Feng et al. · pku

Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field.

1.2MED-PHApr 3, 2020

Predicting the risk of pancreatic cancer with a CT-based ensemble AI algorithm

Chenjie Zhou MD, Jianhua Ma Ph. D, Xiaoping Xu MD et al.

Objectives: Pancreatic cancer is a lethal disease, hard to diagnose and usually results in poor prognosis and high mortality. Developing an artificial intelligence (AI) algorithm to accurately and universally predict the early cancer risk of all kinds of pancreatic cancer is extremely important. We propose an ensemble AI algorithm to predict universally cancer risk of all kinds of pancreatic lesions with noncontrast CT. Methods: Our algorithm combines the radiomics method and a support tensor machine (STM) by the evidence reasoning (ER) technique to construct a binary classifier, called RadSTM-ER. RadSTM-ER takes advantage of the handcrafted features used in radiomics and learning features learned automatically by the STM from the CTs for presenting better characteristics of lesions. The patient cohort consisted of 135 patients with pathological diagnosis results where 97 patients had malignant lesions. Twenty-seven patients were randomly selected as independent test samples, and the remaining patients were used in a 5-fold cross validation experiment to confirm the hyperparameters, select optimal handcrafted features and train the model. Results: RadSTM-ER achieved independent test results: an area under the receiver operating characteristic curve of 0.8951, an accuracy of 85.19%, a sensitivity of 88.89%, a specificity of 77.78%, a positive predictive value of 88.89% and a negative predictive value of 77.78%. Conclusions: These results are better than the diagnostic performance of the five experimental radiologists, four conventional AI algorithms, which initially demonstrate the potential of noncontrast CT-based RadSTM-ER in cancer risk prediction for all kinds of pancreatic lesions.