GTApr 24, 2022
Facility Location Games Beyond Single-Peakedness: the Entrance Fee ModelMengfan Ma, Mingyu Xiao, Tian Bai et al.
The facility location game has been studied extensively in mechanism design. In the classical model, each agent's cost is solely determined by her distance to the nearest facility. In this paper, we introduce a novel model where each facility charges an entrance fee. Thus, the cost of each agent is determined by both the distance to the facility and the entrance fee of the facility. In our model, the entrance fee function is allowed to be an arbitrary function, causing agents' preferences may no longer be single-peaked anymore: This departure from the classical model introduces additional challenges. We systematically delve into the intricacies of the model, designing strategyproof mechanisms with favorable approximation ratios. Additionally, we complement these ratios with nearly-tight impossibility results. Specifically, for one-facility and two-facility games, we provide upper and lower bounds for the approximation ratios given by deterministic and randomized mechanisms with respect to utilitarian and egalitarian objectives.
95.6MEMar 25
Conformal Selective Prediction with General Risk ControlTian Bai, Ying Jin
In deploying artificial intelligence (AI) models, selective prediction offers the option to abstain from making a prediction when uncertain about model quality. To fulfill its promise, it is crucial to enforce strict and precise error control over cases where the model is trusted. We propose Selective Conformal Risk control with E-values (SCoRE), a new framework for deriving such decisions for any trained model and any user-defined, bounded and continuously-valued risk. SCoRE offers two types of guarantees on the risk among ``positive'' cases in which the system opts to trust the model. Built upon conformal inference and hypothesis testing ideas, SCoRE first constructs a class of (generalized) e-values, which are non-negative random variables whose product with the unknown risk has expectation no greater than one. Such a property is ensured by data exchangeability without requiring any modeling assumptions. Passing these e-values on to hypothesis testing procedures, we yield the binary trust decisions with finite-sample error control. SCoRE avoids the need of uniform concentration, and can be readily extended to settings with distribution shifts. We evaluate the proposed methods with simulations and demonstrate their efficacy through applications to error management in drug discovery, health risk prediction, and large language models.
CVJan 30
ExpAlign: Expectation-Guided Vision-Language Alignment for Open-Vocabulary GroundingJunyi Hu, Tian Bai, Fengyi Wu et al.
Open-vocabulary grounding requires accurate vision-language alignment under weak supervision, yet existing methods either rely on global sentence embeddings that lack fine-grained expressiveness or introduce token-level alignment with explicit supervision or heavy cross-attention designs. We propose ExpAlign, a theoretically grounded vision-language alignment framework built on a principled multiple instance learning formulation. ExpAlign introduces an Expectation Alignment Head that performs attention-based soft MIL pooling over token-region similarities, enabling implicit token and instance selection without additional annotations. To further stabilize alignment learning, we develop an energy-based multi-scale consistency regularization scheme, including a Top-K multi-positive contrastive objective and a Geometry-Aware Consistency Objective derived from a Lagrangian-constrained free-energy minimization. Extensive experiments show that ExpAlign consistently improves open-vocabulary detection and zero-shot instance segmentation, particularly on long-tail categories. Most notably, it achieves 36.2 AP$_r$ on the LVIS minival split, outperforming other state-of-the-art methods at comparable model scale, while remaining lightweight and inference-efficient.
95.8CCMay 12
Feedback Set Problems on Bounded-Degree (Planar) GraphsTian Bai, Yixin Cao, Mingyu Xiao
The feedback set problems are about removing the minimum number of vertices or edges from a graph to break all its cycles. Much effort has gone into understanding their complexity on planar graphs as well as on graphs of bounded degree. We obtain a complete complexity classification for these problems on bounded-degree digraphs, including the planar case. In particular, we show that both problems are $\NP$-complete on digraphs of maximum degree three, while on planar digraphs the feedback vertex set problem is polynomial-time solvable when each vertex has either indegree at most one or outdegree at most one, and $\NP$-complete otherwise. We also give tight degree bounds for the connected feedback vertex set problem on undirected graphs, both planar and non-planar. We close the paper with a historical account of results for feedback vertex set on undirected graphs of bounded degree.
CLJan 16
Multi-Stage Patient Role-Playing Framework for Realistic Clinical InteractionsShijie Jiang, Zefan Zhang, Kehua Zhu et al.
The simulation of realistic clinical interactions plays a pivotal role in advancing clinical Large Language Models (LLMs) and supporting medical diagnostic education. Existing approaches and benchmarks rely on generic or LLM-generated dialogue data, which limits the authenticity and diversity of doctor-patient interactions. In this work, we propose the first Chinese patient simulation dataset (Ch-PatientSim), constructed from realistic clinical interaction scenarios to comprehensively evaluate the performance of models in emulating patient behavior. Patients are simulated based on a five-dimensional persona structure. To address issues of the persona class imbalance, a portion of the dataset is augmented using few-shot generation, followed by manual verification. We evaluate various state-of-the-art LLMs and find that most produce overly formal responses that lack individual personality. To address this limitation, we propose a training-free Multi-Stage Patient Role-Playing (MSPRP) framework, which decomposes interactions into three stages to ensure both personalization and realism in model responses. Experimental results demonstrate that our approach significantly improves model performance across multiple dimensions of patient simulation.
74.5DSApr 28
Clustering Permutations under the Ulam Metric: A Parameterized Complexity StudyTian Bai, Fedor V. Fomin, Petr A. Golovach et al.
Rank aggregation seeks a representative permutation for a collection of rankings and plays a central role in areas such as social choice, information retrieval, and computational biology. Two fundamental aggregation tasks are the center and median problems, which minimize the maximum and the total distance to the input permutations, respectively. While these problems are well understood under Kendall's tau and related distances, their parameterized complexity under the Ulam metric, an edit-distance-based metric on permutations, has remained largely unexplored. In this work, we initiate a systematic study of the parameterized complexity of rank aggregation under the Ulam metric. We consider both the center and median problems, as well as their generalizations to the $k$-center and $k$-median clustering settings, parameterized by the number of centers $k$ and the distance budget $d$ (corresponding to the maximum distance for center variants and the total distance for median variants). Both problems are known to be NP-hard already for $k=1$. We show that the Ulam $k$-center problem remains NP-hard when $d=1$, but is fixed-parameter tractable when parameterized by $k + d$. Our algorithm is based on a novel local-search framework tailored to the non-local nature of Ulam distances. We complement this by proving that no polynomial kernel exists for the $k+d$ parameterization unless NP $\subseteq$ coNP/poly. For the Ulam $k$-median problem parameterized by the total distance $d$, we establish W[1]-hardness and provide an XP algorithm. We also provide a polynomial kernel for the parameter $k + d$, which in turn yields a fixed-parameter tractable algorithm.
CVJan 15
VERHallu: Evaluating and Mitigating Event Relation Hallucination in Video Large Language ModelsZefan Zhang, Kehua Zhu, Shijie Jiang et al.
Video Large Language Models (VideoLLMs) exhibit various types of hallucinations. Existing research has primarily focused on hallucinations involving the presence of events, objects, and scenes in videos, while largely neglecting event relation hallucination. In this paper, we introduce a novel benchmark for evaluating the Video Event Relation Hallucination, named VERHallu. This benchmark focuses on causal, temporal, and subevent relations between events, encompassing three types of tasks: relation classification, question answering, and counterfactual question answering, for a comprehensive evaluation of event relation hallucination. Additionally, it features counterintuitive video scenarios that deviate from typical pretraining distributions, with each sample accompanied by human-annotated candidates covering both vision-language and pure language biases. Our analysis reveals that current state-of-the-art VideoLLMs struggle with dense-event relation reasoning, often relying on prior knowledge due to insufficient use of frame-level cues. Although these models demonstrate strong grounding capabilities for key events, they often overlook the surrounding subevents, leading to an incomplete and inaccurate understanding of event relations. To tackle this, we propose a Key-Frame Propagating (KFP) strategy, which reallocates frame-level attention within intermediate layers to enhance multi-event understanding. Experiments show it effectively mitigates the event relation hallucination without affecting inference speed.
MENov 27, 2024
Optimized Conformal Selection: Powerful Selective Inference After Conformity Score OptimizationTian Bai, Ying Jin
Model selection/optimization in conformal inference is challenging, since it may break the exchangeability between labeled and unlabeled data. We study this problem in the context of conformal selection, which uses conformal p-values to select ``interesting'' instances with large unobserved labels from a pool of unlabeled data, while controlling the FDR in finite sample. For validity, existing solutions require the model choice to be independent of the data used to construct the p-values and calibrate the selection set. However, when presented with many model choices and limited labeled data, it is desirable to (i) select the best model in a data-driven manner, and (ii) mitigate power loss due to sample splitting. This paper presents OptCS, a general framework that allows valid statistical testing (selection) after flexible data-driven model optimization. We introduce general conditions under which OptCS constructs valid conformal p-values despite substantial data reuse and handles complex p-value dependencies to maintain finite-sample FDR control via a novel multiple testing procedure. We instantiate this general recipe to propose three FDR-controlling procedures, each optimizing the models differently: (i) selecting the most powerful one among multiple pre-trained candidate models, (ii) using all data for model fitting without sample splitting, and (iii) combining full-sample model fitting and selection. We demonstrate the efficacy of our methods via simulation studies and real applications in drug discovery and alignment of large language models in radiology report generation.
MEMay 1, 2025
Multivariate Conformal SelectionTian Bai, Yue Zhao, Xiang Yu et al.
Selecting high-quality candidates from large datasets is critical in applications such as drug discovery, precision medicine, and alignment of large language models (LLMs). While Conformal Selection (CS) provides rigorous uncertainty quantification, it is limited to univariate responses and scalar criteria. To address this issue, we propose Multivariate Conformal Selection (mCS), a generalization of CS designed for multivariate response settings. Our method introduces regional monotonicity and employs multivariate nonconformity scores to construct conformal p-values, enabling finite-sample False Discovery Rate (FDR) control. We present two variants: mCS-dist, using distance-based scores, and mCS-learn, which learns optimal scores via differentiable optimization. Experiments on simulated and real-world datasets demonstrate that mCS significantly improves selection power while maintaining FDR control, establishing it as a robust framework for multivariate selection tasks.
CLApr 23, 2025
Text-to-TrajVis: Enabling Trajectory Data Visualizations from Natural Language QuestionsTian Bai, Huiyan Ying, Kailong Suo et al.
This paper introduces the Text-to-TrajVis task, which aims to transform natural language questions into trajectory data visualizations, facilitating the development of natural language interfaces for trajectory visualization systems. As this is a novel task, there is currently no relevant dataset available in the community. To address this gap, we first devised a new visualization language called Trajectory Visualization Language (TVL) to facilitate querying trajectory data and generating visualizations. Building on this foundation, we further proposed a dataset construction method that integrates Large Language Models (LLMs) with human efforts to create high-quality data. Specifically, we first generate TVLs using a comprehensive and systematic process, and then label each TVL with corresponding natural language questions using LLMs. This process results in the creation of the first large-scale Text-to-TrajVis dataset, named TrajVL, which contains 18,140 (question, TVL) pairs. Based on this dataset, we systematically evaluated the performance of multiple LLMs (GPT, Qwen, Llama, etc.) on this task. The experimental results demonstrate that this task is both feasible and highly challenging and merits further exploration within the research community.
CVMay 19, 2025
Pyramid Sparse Transformer: Enhancing Multi-Scale Feature Fusion with Dynamic Token SelectionJunyi Hu, Tian Bai, Fengyi Wu et al.
Feature fusion is critical for high-performance vision models but often incurs prohibitive complexity. However, prevailing attention-based fusion methods often involve significant computational complexity and implementation challenges, limiting their efficiency in resource-constrained environments. To address these issues, we introduce the Pyramid Sparse Transformer (PST), a lightweight, plug-and-play module that integrates coarse-to-fine token selection and shared attention parameters to reduce computation while preserving spatial detail. PST can be trained using only coarse attention and seamlessly activated at inference for further accuracy gains without retraining. When added to state-of-the-art real-time detection models, such as YOLOv11-N/S/M, PST yields mAP improvements of 0.9%, 0.5%, and 0.4% on MS COCO with minimal latency impact. Likewise, embedding PST into ResNet-18/50/101 as backbones, boosts ImageNet top-1 accuracy by 6.5%, 1.7%, and 1.0%, respectively. These results demonstrate PST's effectiveness as a simple, hardware-friendly enhancement for both detection and classification tasks.