21.7LGJun 4Code
CaliDist: Calibrating Large Language Models via Behavioral Robustness to DistractionMohammad Anas Jawad, Cornelia Caragea
Existing calibration methods for Large Language Models (LLMs) often overlook a critical dimension of trustworthiness: a model's {\em behavioral robustness} to irrelevant or misleading information. In this paper, we argue that a model's true confidence should reflect its stability under cognitive pressure. We introduce \textsc{CaliDist}, a novel post-hoc calibration approach that directly measures and penalizes a model's susceptibility to distraction. \textsc{CaliDist} quantifies how an LLM's predictions and uncertainty change when its input prompt is perturbed with semantic \textit{distractors}. This stability (or lack thereof) signal is then used to adaptively scale the model's initial confidence score. Our extensive experiments on seven Natural Language Understanding classification benchmarks using six distinct LLMs show that \textsc{CaliDist} consistently achieves lower Expected Calibration Error (ECE) and Brier Score compared with strong baselines. Remarkably, our method reduces the ECE from 23\% to 7\% on average--a relative improvement of 70\%--demonstrating that behavioral stability is a powerful signal for calibration. We make our code and datasets available at github.com/m-anas-j/CaliDist.
HCSep 8, 2021
Renovo: Sensor-Based Visual Assistive Technology for Physiotherapists in the Rehabilitation of Stroke Patients with Upper Limb Motor ImpairmentsMohammad Ridwan Kabir, Mohammad Ishrak Abedin, Mohaimin Ehsan et al.
Stroke patients with upper limb motor impairments are re-acclimated to their corresponding motor functionalities through therapeutic interventions. Physiotherapists typically assess these functionalities using various qualitative protocols. However, such assessments are often biased and prone to errors, reducing rehabilitation efficacy. Therefore, real-time visualization and quantitative analysis of performance metrics, such as range of motion, repetition rate, velocity, etc., are crucial for accurate progress assessment. This study introduces Renovo, a working prototype of a wearable motion sensor-based assistive technology that assists physiotherapists with real-time visualization of these metrics. We also propose a novel mathematical framework for generating quantitative performance scores without relying on any machine learning model. We present the results of a three-week pilot study involving 16 stroke patients with upper limb disabilities, evaluated across three successive sessions at one-week intervals by both Renovo and physiotherapists (N=5). Results suggest that while the expertise of a physiotherapist is irreplaceable, Renovo can assist in the decision-making process by providing valuable quantitative information.