GTJun 2
Competitive Information Design in Sequential SearchZhicheng Du, Hu Fu, Ying Qin et al.
Advertisements often strategically disclose information to consumers who make decisions on further information acquisition and eventual purchase. Anderson and Renault (2006) model this problem using an information design framework, where the advertiser acts as a sender and the consumer as a receiver. We extend this model to a competitive setting with horizontally differentiated senders competing for a unit-demand receiver. Under costly inspection, the receiver's optimal sequential search action is given by Weitzman's Index Algorithm. We give a method, based on duality arguments, to verify whether a sender's given information strategy constitutes a best response against his competitors (other senders). We establish the existence of an equilibrium in the game among senders when the prior distributions have no mass; we also illustrate that such equilibria may exhibit intricate behaviors. Finally, we meticulously characterize symmetric equilibria played by the senders for cases when the prior distributions have monotone increasing densities, while offering economic intuitions behind the insightful equilibrium structure.
IRJun 1
Time-Aware Diffusion based on Preference Disentanglement for Generative RecommendationBangguo Zhu, Peng Huo, Yuanbo Zhao et al.
Recently, Generative Recommenders (GRs) have emerged as a transformative recommendation paradigm by replacing traditional item IDs with semantic indices (SIDs). Owing to the exceptional generative capabilities of diffusion models, a few pioneering works explore developing GRs with diffusion architectures as the backbone. However, a fatal limitation of existing diffusion-based GRs is that the diffusion process applies uniformly to all items within the historical interactions. In contrast, the user preference is shaped by multifaceted time-evolving factors and thus exhibits a non-stationary distribution in the temporal aspect. To bridge this gap, this study proposes a novel GR framework, named TDPM, by designing the time-aware diffusion on SID tokens. Specifically, TDPM explicitly integrates the impact of time-evolving user preferences into the diffusion process. In detail, the user preference is disentangled into (i) the period preference, which remains consistent over a long time-span, and (ii) the point preference, which is triggered by recent focal events. Extensive experiments on three public real-world datasets demonstrate the significant superiority of TDPM over the state-of-the-art baselines. TDPM achieves average improvements of up to 29.21% and 25.45% in terms of HR@20 and NDCG@20, respectively. The ablation study further underscores the necessity of time-aware token diffusion in diffusion-based GRs.
CVMay 27
Auditing Training-Free 3D Shape Retrieval with Diffused Geodesic MomentsZhicheng Du, Changyue Liu, Wenji Xi et al.
Reported retrieval scores for training-free shape descriptors conflate local signal design, normalization, aggregation, codebook fitting, and metric choices, making isolated component evaluation difficult. This paper reframes descriptor evaluation as a {\em protocol audit}. We introduce Diffused Geodesic Moments (DGM), a seed-conditioned descriptor that computes sparse implicit heat responses, converts them to distance-like fields, and summarizes each vertex by low-order moments across seeds and scales. DGM is used both as a practical non-spectral baseline and as an instrument for isolating protocol effects. On the registered FAUST benchmark split (FAUST-Reg) and the TOSCA shape collection, aggregation-matched experiments show that an independent Geometric Moment Shape Descriptor baseline built on Heat Kernel Signature features (GMSD-HKS) obtains the highest scores in this implementation ($0.621/0.820$ and $0.865/0.963$ mean average precision (mAP)/top-1), Wave Kernel Signature (WKS) remains a strong classical signal, and DGM is useful mainly when sparse solves, non-spectral deployment, or symmetry-informative seed frames are priorities. The broader finding is methodological: the input field and aggregation protocol can dominate the moment formula. The paper contributes a reproducible protocol-cascade analysis, a cross-shape alignment diagnostic for functional-map compatibility, and concrete recommendations for designing and reporting training-free shape descriptors.
LGSep 18, 2023
GAME: Generalized deep learning model towards multimodal data integration for early screening of adolescent mental disordersZhicheng Du, Chenyao Jiang, Xi Yuan et al.
The timely identification of mental disorders in adolescents is a global public health challenge.Single factor is difficult to detect the abnormality due to its complex and subtle nature. Additionally, the generalized multimodal Computer-Aided Screening (CAS) systems with interactive robots for adolescent mental disorders are not available. Here, we design an android application with mini-games and chat recording deployed in a portable robot to screen 3,783 middle school students and construct the multimodal screening dataset, including facial images, physiological signs, voice recordings, and textual transcripts.We develop a model called GAME (Generalized Model with Attention and Multimodal EmbraceNet) with novel attention mechanism that integrates cross-modal features into the model. GAME evaluates adolescent mental conditions with high accuracy (73.34%-92.77%) and F1-Score (71.32%-91.06%).We find each modality contributes dynamically to the mental disorders screening and comorbidities among various mental disorders, indicating the feasibility of explainable model. This study provides a system capable of acquiring multimodal information and constructs a generalized multimodal integration algorithm with novel attention mechanisms for the early screening of adolescent mental disorders.
COMay 26
Prime Certificates for Exact Vertex-Coprime Ramsey NumbersZhicheng Du, Wenji Xi, Zhuo Deng et al.
Let $G_n$ be the coprime graph on $\{1,\ldots,n\}$. We prove that the mixed vertex-coloring coprime Ramsey number satisfies \[ \Rcop(k_1,\ldots,k_c)=p_{\sum_{i=1}^c(k_i-1)}, \] where $p_m$ is the $m$-th prime. The proof is elementary: the prime clique $\{1\}\cup\{p\le n:p\text{ prime}\}$ gives the upper bound by pigeonhole, while a prime-bin partition gives the matching lower bound by coloring each composite with a bin containing one of its prime divisors. We reserve $\Rcop$ for this vertex-coloring parameter; the edge-coloring parameter on the same host graph is denoted $\Redge$. The same certificate viewpoint yields three extensions: a support-disjointness generalization, a polynomial-time certificate-extraction primitive, and an exact reduction of the edge-coloring variant to classical Ramsey numbers: $\Redge(k_1,\ldots,k_c)=p_{\Rcl(k_1,\ldots,k_c)-1}$. These two formulas are rank transfers from the same clique-label certificate. We also prove that the balanced two-color diagonal threshold equals the unrestricted threshold $p_{2k-2}$ for all $k\ge2$, via a deterministic prime-bin split requiring only the weak inequality $2p_m<p_{2m}<3p_m$; for fixed $c$, a Hall argument plus a standard Selberg--Delange estimate gives eventual multicolor balanced certificates.
CVAug 31, 2023
Prompt-enhanced Hierarchical Transformer Elevating Cardiopulmonary Resuscitation Instruction via Temporal Action SegmentationYang Liu, Xiaoyun Zhong, Shiyao Zhai et al.
The vast majority of people who suffer unexpected cardiac arrest are performed cardiopulmonary resuscitation (CPR) by passersby in a desperate attempt to restore life, but endeavors turn out to be fruitless on account of disqualification. Fortunately, many pieces of research manifest that disciplined training will help to elevate the success rate of resuscitation, which constantly desires a seamless combination of novel techniques to yield further advancement. To this end, we collect a custom CPR video dataset in which trainees make efforts to behave resuscitation on mannequins independently in adherence to approved guidelines, thereby devising an auxiliary toolbox to assist supervision and rectification of intermediate potential issues via modern deep learning methodologies. Our research empirically views this problem as a temporal action segmentation (TAS) task in computer vision, which aims to segment an untrimmed video at a frame-wise level. Here, we propose a Prompt-enhanced hierarchical Transformer (PhiTrans) that integrates three indispensable modules, including a textual prompt-based Video Features Extractor (VFE), a transformer-based Action Segmentation Executor (ASE), and a regression-based Prediction Refinement Calibrator (PRC). The backbone of the model preferentially derives from applications in three approved public datasets (GTEA, 50Salads, and Breakfast) collected for TAS tasks, which accounts for the excavation of the segmentation pipeline on the CPR dataset. In general, we unprecedentedly probe into a feasible pipeline that genuinely elevates the CPR instruction qualification via action segmentation in conjunction with cutting-edge deep learning techniques. Associated experiments advocate our implementation with multiple metrics surpassing 91.0%.
CLOct 18, 2024Code
SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based AgentJiarui Ji, Yang Li, Hongtao Liu et al.
Public scarce resource allocation plays a crucial role in economics as it directly influences the efficiency and equity in society. Traditional studies including theoretical model-based, empirical study-based and simulation-based methods encounter limitations due to the idealized assumption of complete information and individual rationality, as well as constraints posed by limited available data. In this work, we propose an innovative framework, SRAP-Agent (Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent), which integrates Large Language Models (LLMs) into economic simulations, aiming to bridge the gap between theoretical models and real-world dynamics. Using public housing allocation scenarios as a case study, we conduct extensive policy simulation experiments to verify the feasibility and effectiveness of the SRAP-Agent and employ the Policy Optimization Algorithm with certain optimization objectives. The source code can be found in https://github.com/jijiarui-cather/SRAPAgent_Framework
CVJul 7, 2025Code
Hear-Your-Click: Interactive Object-Specific Video-to-Audio GenerationYingshan Liang, Keyu Fan, Zhicheng Du et al.
Video-to-audio (V2A) generation shows great potential in fields such as film production. Despite significant advances, current V2A methods relying on global video information struggle with complex scenes and generating audio tailored to specific objects. To address these limitations, we introduce Hear-Your-Click, an interactive V2A framework enabling users to generate sounds for specific objects by clicking on the frame. To achieve this, we propose Object-aware Contrastive Audio-Visual Fine-tuning (OCAV) with a Mask-guided Visual Encoder (MVE) to obtain object-level visual features aligned with audio. Furthermore, we tailor two data augmentation strategies, Random Video Stitching (RVS) and Mask-guided Loudness Modulation (MLM), to enhance the model's sensitivity to segmented objects. To measure audio-visual correspondence, we designed a new evaluation metric, the CAV score. Extensive experiments demonstrate that our framework offers more precise control and improves generation performance across various metrics. Project Page: https://github.com/SynapGrid/Hear-Your-Click
THMay 5
Going Public: Communication in Collective DecisionsZhicheng Du, Yingkai Li, Boli Xu
A principal and $n\ge 2$ agents can launch a project if the principal proposes it and at least $k$ agents accept. Their individual payoffs from the project depend on an ex ante unknown state. The principal can conduct a test to learn about the state and then communicate her findings to the agents via cheap talk. This paper focuses on comparing two communication regimes: public and private messaging. We show that public messaging is weakly dominant: any outcome implementable under private messaging can also be implemented under public messaging. Moreover, in a canonical environment with linear payoffs, we characterize the principal's optimal test in each regime and show that public messaging can be strictly dominant if and only if there exist two agents who are the principal's conflicting allies.
AIMar 23, 2024
LAMPER: LanguAge Model and Prompt EngineeRing for zero-shot time series classificationZhicheng Du, Zhaotian Xie, Yan Tong et al.
This study constructs the LanguAge Model with Prompt EngineeRing (LAMPER) framework, designed to systematically evaluate the adaptability of pre-trained language models (PLMs) in accommodating diverse prompts and their integration in zero-shot time series (TS) classification. We deploy LAMPER in experimental assessments using 128 univariate TS datasets sourced from the UCR archive. Our findings indicate that the feature representation capacity of LAMPER is influenced by the maximum input token threshold imposed by PLMs.
CVMar 23, 2024
Cognitive resilience: Unraveling the proficiency of image-captioning models to interpret masked visual contentZhicheng Du, Zhaotian Xie, Huazhang Ying et al.
This study explores the ability of Image Captioning (IC) models to decode masked visual content sourced from diverse datasets. Our findings reveal the IC model's capability to generate captions from masked images, closely resembling the original content. Notably, even in the presence of masks, the model adeptly crafts descriptive textual information that goes beyond what is observable in the original image-generated captions. While the decoding performance of the IC model experiences a decline with an increase in the masked region's area, the model still performs well when important regions of the image are not masked at high coverage.
CVSep 22, 2025
MAJORScore: A Novel Metric for Evaluating Multimodal Relevance via Joint RepresentationZhicheng Du, Qingyang Shi, Jiasheng Lu et al.
The multimodal relevance metric is usually borrowed from the embedding ability of pretrained contrastive learning models for bimodal data, which is used to evaluate the correlation between cross-modal data (e.g., CLIP). However, the commonly used evaluation metrics are only suitable for the associated analysis between two modalities, which greatly limits the evaluation of multimodal similarity. Herein, we propose MAJORScore, a brand-new evaluation metric for the relevance of multiple modalities ($N$ modalities, $N\ge3$) via multimodal joint representation for the first time. The ability of multimodal joint representation to integrate multiple modalities into the same latent space can accurately represent different modalities at one scale, providing support for fair relevance scoring. Extensive experiments have shown that MAJORScore increases by 26.03%-64.29% for consistent modality and decreases by 13.28%-20.54% for inconsistence compared to existing methods. MAJORScore serves as a more reliable metric for evaluating similarity on large-scale multimodal datasets and multimodal model performance evaluation.
AIAug 16, 2025
CHBench: A Cognitive Hierarchy Benchmark for Evaluating Strategic Reasoning Capability of LLMsHongtao Liu, Zhicheng Du, Zihe Wang et al.
Game-playing ability serves as an indicator for evaluating the strategic reasoning capability of large language models (LLMs). While most existing studies rely on utility performance metrics, which are not robust enough due to variations in opponent behavior and game structure. To address this limitation, we propose \textbf{Cognitive Hierarchy Benchmark (CHBench)}, a novel evaluation framework inspired by the cognitive hierarchy models from behavioral economics. We hypothesize that agents have bounded rationality -- different agents behave at varying reasoning depths/levels. We evaluate LLMs' strategic reasoning through a three-phase systematic framework, utilizing behavioral data from six state-of-the-art LLMs across fifteen carefully selected normal-form games. Experiments show that LLMs exhibit consistent strategic reasoning levels across diverse opponents, confirming the framework's robustness and generalization capability. We also analyze the effects of two key mechanisms (Chat Mechanism and Memory Mechanism) on strategic reasoning performance. Results indicate that the Chat Mechanism significantly degrades strategic reasoning, whereas the Memory Mechanism enhances it. These insights position CHBench as a promising tool for evaluating LLM capabilities, with significant potential for future research and practical applications.