Markus Kängsepp

LG
h-index3
6papers
551citations
Novelty37%
AI Score37

6 Papers

CVNov 8, 2022
Calibrated Perception Uncertainty Across Objects and Regions in Bird's-Eye-View

Markus Kängsepp, Meelis Kull

In driving scenarios with poor visibility or occlusions, it is important that the autonomous vehicle would take into account all the uncertainties when making driving decisions, including choice of a safe speed. The grid-based perception outputs, such as occupancy grids, and object-based outputs, such as lists of detected objects, must then be accompanied by well-calibrated uncertainty estimates. We highlight limitations in the state-of-the-art and propose a more complete set of uncertainties to be reported, particularly including undetected-object-ahead probabilities. We suggest a novel way to get these probabilistic outputs from bird's-eye-view probabilistic semantic segmentation, in the example of the FIERY model. We demonstrate that the obtained probabilities are not calibrated out-of-the-box and propose methods to achieve well-calibrated uncertainties.

LGMar 16, 2022
On the Usefulness of the Fit-on-the-Test View on Evaluating Calibration of Classifiers

Markus Kängsepp, Kaspar Valk, Meelis Kull

Every uncalibrated classifier has a corresponding true calibration map that calibrates its confidence. Deviations of this idealistic map from the identity map reveal miscalibration. Such calibration errors can be reduced with many post-hoc calibration methods which fit some family of calibration maps on a validation dataset. In contrast, evaluation of calibration with the expected calibration error (ECE) on the test set does not explicitly involve fitting. However, as we demonstrate, ECE can still be viewed as if fitting a family of functions on the test data. This motivates the fit-on-the-test view on evaluation: first, approximate a calibration map on the test data, and second, quantify its distance from the identity. Exploiting this view allows us to unlock missed opportunities: (1) use the plethora of post-hoc calibration methods for evaluating calibration; (2) tune the number of bins in ECE with cross-validation. Furthermore, we introduce: (3) benchmarking on pseudo-real data where the true calibration map can be estimated very precisely; and (4) novel calibration and evaluation methods using new calibration map families PL and PL3.

CLJan 12
The Confidence Trap: Gender Bias and Predictive Certainty in LLMs

Ahmed Sabir, Markus Kängsepp, Rajesh Sharma

The increased use of Large Language Models (LLMs) in sensitive domains leads to growing interest in how their confidence scores correspond to fairness and bias. This study examines the alignment between LLM-predicted confidence and human-annotated bias judgments. Focusing on gender bias, the research investigates probability confidence calibration in contexts involving gendered pronoun resolution. The goal is to evaluate if calibration metrics based on predicted confidence scores effectively capture fairness-related disparities in LLMs. The results show that, among the six state-of-the-art models, Gemma-2 demonstrates the worst calibration according to the gender bias benchmark. The primary contribution of this work is a fairness-aware evaluation of LLMs' confidence calibration, offering guidance for ethical deployment. In addition, we introduce a new calibration metric, Gender-ECE, designed to measure gender disparities in resolution tasks.

LGMar 28, 2020
Correlated daily time series and forecasting in the M4 competition

Anti Ingel, Novin Shahroudi, Markus Kängsepp et al.

We participated in the M4 competition for time series forecasting and describe here our methods for forecasting daily time series. We used an ensemble of five statistical forecasting methods and a method that we refer to as the correlator. Our retrospective analysis using the ground truth values published by the M4 organisers after the competition demonstrates that the correlator was responsible for most of our gains over the naive constant forecasting method. We identify data leakage as one reason for its success, partly due to test data selected from different time intervals, and partly due to quality issues in the original time series. We suggest that future forecasting competitions should provide actual dates for the time series so that some of those leakages could be avoided by the participants.

LGOct 28, 2019
Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration

Meelis Kull, Miquel Perello-Nieto, Markus Kängsepp et al.

Class probabilities predicted by most multiclass classifiers are uncalibrated, often tending towards over-confidence. With neural networks, calibration can be improved by temperature scaling, a method to learn a single corrective multiplicative factor for inputs to the last softmax layer. On non-neural models the existing methods apply binary calibration in a pairwise or one-vs-rest fashion. We propose a natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification. It is easily implemented with neural nets since it is equivalent to log-transforming the uncalibrated probabilities, followed by one linear layer and softmax. Experiments demonstrate improved probabilistic predictions according to multiple measures (confidence-ECE, classwise-ECE, log-loss, Brier score) across a wide range of datasets and classifiers. Parameters of the learned Dirichlet calibration map provide insights to the biases in the uncalibrated model.

NCAug 24, 2015
Change Blindness in 3D Virtual Reality

Madis Vasser, Markus Kängsepp, Jaan Aru

In the present change blindness study subjects explored stereoscopic three dimensional (3D) environments through a virtual reality (VR) headset. A novel method that tracked the subjects' head movements was used for inducing changes in the scene whenever the changing object was out of the field of view. The effect of change location (foreground or background in 3D depth) on change blindness was investigated. Two experiments were conducted, one in the lab (n = 50) and the other online (n = 25). Up to 25% of the changes were undetected and the mean overall search time was 27 seconds in the lab study. Results indicated significantly lower change detection success and more change cycles if the changes occurred in the background, with no differences in overall search times. The results confirm findings from previous studies and extend them to 3D environments. The study also demonstrates the feasibility of online VR experiments.