37.9CLMay 25
MiRD: Reliable Set-Valued Prediction for Open-Ended Question Answering via Miscoverage Risk DecompositionAnqi Hu, Zhiyuan Wang, Zijun Jia et al.
Reliable set-valued prediction provides a principled way to mitigate hallucinations in open-ended question answering (QA), yet existing conformal approaches typically rely on a fragile premise: finite sampling must already produce at least one admissible candidate, or calibration examples violating this condition are discarded. In this paper, we introduce MiRD, a two-stage framework that decomposes overall miscoverage into sampling failure and conditional selection failure. In Stage I, MiRD establishes an expectation-level marginal upper bound on the probability that finite sampling produces no admissible answer under a fixed budget. In Stage II, conditioned on sampling success, MiRD calibrates a conformal selection threshold using admission-correlated nonconformity scores defined over the full calibration set, thereby preserving calibration-set integrity. Across three open-ended QA datasets and eight models, MiRD controls sampling risk, conditional selection risk, and overall miscoverage, while yielding tighter first-stage bounds than PAC-style alternatives and more adaptive prediction sets than successful-only calibration.
24.5CLMar 24
Set-Valued Prediction for Large Language Models with Feasibility-Aware Coverage GuaranteesYe Li, Anqi Hu, Yuanchang Ye et al.
Large language models (LLMs) inherently operate over a large generation space, yet conventional usage typically reports the most likely generation (MLG) as a point prediction, which underestimates the model's capability: although the top-ranked response can be incorrect, valid answers may still exist within the broader output space and can potentially be discovered through repeated sampling. This observation motivates moving from point prediction to set-valued prediction, where the model produces a set of candidate responses rather than a single MLG. In this paper, we propose a principled framework for set-valued prediction, which provides feasibility-aware coverage guarantees. We show that, given the finite-sampling nature of LLM generation, coverage is not always achievable: even with multiple samplings, LLMs may fail to yield an acceptable response for certain questions within the sampled candidate set. To address this, we establish a minimum achievable risk level (MRL), below which statistical coverage guarantees cannot be satisfied. Building on this insight, we then develop a data-driven calibration procedure that constructs prediction sets from sampled responses by estimating a rigorous threshold, ensuring that the resulting set contains a correct answer with a desired probability whenever the target risk level is feasible. Extensive experiments on six language generation tasks with five LLMs demonstrate both the statistical validity and the predictive efficiency of our framework.
CVApr 26, 2021Code
Analyzing Green View Index and Green View Index best path using Google Street View and deep learningJiahao Zhang, Anqi Hu
As an important part of urban landscape research, analyzing and studying street-level greenery can increase the understanding of a city's greenery, contributing to better urban living environment planning and design. Planning the best path of urban greenery is a means to effectively maximize the use of urban greenery, which plays a positive role in the physical and mental health of urban residents and the path planning of visitors. In this paper, we used Google Street View (GSV) to obtain street view images of Osaka City. The semantic segmentation model is adopted to segment the street view images and analyze the Green View Index (GVI) of Osaka City. Based on the GVI, we take advantage of the adjacency matrix and Floyd-Warshall Algorithm to calculate Green View Index best path, solving the limitations of ArcGIS software. Our analysis not only allows the calculation of specific routes for the GVI best paths but also realizes the visualization and integration of neighborhood urban greenery. By summarizing all the data, we can conduct an intuitive feeling and objective analysis of the street-level greenery in the research area. Based on this, such as urban residents and visitors can maximize the available natural resources for a better life. The dataset and code are available at https://github.com/Jackieam/GVI-Best-Path.