CVAIFeb 10, 2025

Conformal Predictions for Human Action Recognition with Vision-Language Models

arXiv:2502.06631v2h-index: 22025 IEEE International Conference on Image Processing Workshops (ICIPW)
Originality Incremental advance
AI Analysis

This research addresses the problem of reliable human action recognition for human-in-the-loop systems, which is crucial for high-stakes, real-world applications where AI must collaborate with human decision-makers.

This work tackled the problem of enhancing the reliability of human action recognition systems, resulting in a significant reduction of the average number of candidate classes. The proposed method achieved this without modifying the underlying Vision-Language Model, but with some limitations due to long-tailed distributions.

Human-in-the-Loop (HITL) systems are essential in high-stakes, real-world applications where AI must collaborate with human decision-makers. This work investigates how Conformal Prediction (CP) techniques, which provide rigorous coverage guarantees, can enhance the reliability of state-of-the-art human action recognition (HAR) systems built upon Vision-Language Models (VLMs). We demonstrate that CP can significantly reduce the average number of candidate classes without modifying the underlying VLM. However, these reductions often result in distributions with long tails which can hinder their practical utility. To mitigate this, we propose tuning the temperature of the softmax prediction, without using additional calibration data. This work contributes to ongoing efforts for multi-modal human-AI interaction in dynamic real-world environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes