Zihe Wang

GT
h-index37
17papers
272citations
Novelty50%
AI Score56

17 Papers

GTJun 2
Second-Best Bilateral Trade is $1/2$ Efficient

Zhengyang Liu, Ying Qin, Zeyu Ren et al.

The landmark Myerson-Satterthwaite Theorem establishes a fundamental impossibility in bilateral trade: no Bayesian incentive-compatible mechanism can simultaneously achieve ex-post efficiency, individual rationality, and strong budget balance. We resolve a long-standing open question regarding the efficiency loss imposed by these constraints. Specifically, we prove that the Bayesian-optimal (second-best) mechanism always captures at least half of the first-best gains from trade ($\mathrm{SB}\ge\frac{1}{2}\mathrm{FB}$). This result is tight, definitively closing the gap between the previously best-known bounds of $0.317$ and $0.736$.

GTJun 2
Competitive Information Design in Sequential Search

Zhicheng Du, Hu Fu, Ying Qin et al.

Advertisements often strategically disclose information to consumers who make decisions on further information acquisition and eventual purchase. Anderson and Renault (2006) model this problem using an information design framework, where the advertiser acts as a sender and the consumer as a receiver. We extend this model to a competitive setting with horizontally differentiated senders competing for a unit-demand receiver. Under costly inspection, the receiver's optimal sequential search action is given by Weitzman's Index Algorithm. We give a method, based on duality arguments, to verify whether a sender's given information strategy constitutes a best response against his competitors (other senders). We establish the existence of an equilibrium in the game among senders when the prior distributions have no mass; we also illustrate that such equilibria may exhibit intricate behaviors. Finally, we meticulously characterize symmetric equilibria played by the senders for cases when the prior distributions have monotone increasing densities, while offering economic intuitions behind the insightful equilibrium structure.

CVSep 12, 2023Code
Enhancing multimodal cooperation via sample-level modality valuation

Yake Wei, Ruoxuan Feng, Zihe Wang et al.

One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities. However most models often suffer from unsatisfactory multimodal cooperation which cannot jointly utilize all modalities well. Some methods are proposed to identify and enhance the worse learnt modality but they are often hard to provide the fine-grained observation of multimodal cooperation at sample-level with theoretical support. Hence it is essential to reasonably observe and improve the fine-grained cooperation between modalities especially when facing realistic scenarios where the modality discrepancy could vary across different samples. To this end we introduce a sample-level modality valuation metric to evaluate the contribution of each modality for each sample. Via modality valuation we observe that modality discrepancy indeed could be different at sample-level beyond the global contribution discrepancy at dataset-level. We further analyze this issue and improve cooperation between modalities at sample-level by enhancing the discriminative ability of low-contributing modalities in a targeted manner. Overall our methods reasonably observe the fine-grained uni-modal contribution and achieve considerable improvement. The source code and dataset are available at https://github.com/GeWu-Lab/Valuate-and-Enhance-Multimodal-Cooperation.

CVJul 23, 2024
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs

Jihyung Kil, Zheda Mai, Justin Lee et al.

The ability to compare objects, scenes, or situations is crucial for effective decision-making and problem-solving in everyday life. For instance, comparing the freshness of apples enables better choices during grocery shopping while comparing sofa designs helps optimize the aesthetics of our living space. Despite its significance, the comparative capability is largely unexplored in artificial general intelligence (AGI). In this paper, we introduce MLLM-CompBench, a benchmark designed to evaluate the comparative reasoning capability of multimodal large language models (MLLMs). MLLM-CompBench mines and pairs images through visually oriented questions covering eight dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. We curate a collection of around 40K image pairs using metadata from diverse vision datasets and CLIP similarity scores. These image pairs span a broad array of visual domains, including animals, fashion, sports, and both outdoor and indoor scenes. The questions are carefully crafted to discern relative characteristics between two images and are labeled by human annotators for accuracy and relevance. We use MLLM-CompBench to evaluate recent MLLMs, including GPT-4V(ision), Gemini-Pro, and LLaVA-1.6. Our results reveal notable shortcomings in their comparative abilities. We believe MLLM-COMPBENCH not only sheds light on these limitations but also establishes a solid foundation for future enhancements in the comparative capability of MLLMs.

GTMay 10
Pacing Equilibria in Second-Price Auctions with Few Goods

Yiyang Huang, Yonglei Yan, Zihe Wang et al.

In this paper, we investigate the computation of second-price pacing equilibria (SPPEs), a foundational model in online advertising auctions. We present a polynomial-time algorithm for computing exact SPPEs in instances with a constant number of goods. Our core technique maps buyers' pacing multipliers to the highest bids on each good, effectively partitioning the parameter space into a set of distinct geometric cells. By enumerating these cells, we fix the relative ordering of the bids and reduce the problem of equilibrium computation to a linear feasibility program. Finally, we demonstrate that this tractability extends to large-scale markets with an arbitrary number of goods, provided the goods can be aggregated into a constant number of valuation types.

AIMar 17
ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation

Zihe Wang, Yihuan Wang, Haiyang Yu. Zhiyong Cui et al.

The current expressway operation relies on rule-based and isolated models, which limits the ability to jointly analyze knowledge across different systems. Meanwhile, Large Language Models (LLMs) are increasingly applied in intelligent transportation, advancing traffic models from algorithmic to cognitive intelligence. However, general LLMs are unable to effectively understand the regulations and causal relationships of events in unconventional scenarios in the expressway field. Therefore, this paper constructs a pre-trained multimodal large language model (MLLM) for expressways, ExpressMind, which serves as the cognitive core for intelligent expressway operations. This paper constructs the industry's first full-stack expressway dataset, encompassing traffic knowledge texts, emergency reasoning chains, and annotated video events to overcome data scarcity. This paper proposes a dual-layer LLM pre-training paradigm based on self-supervised training and unsupervised learning. Additionally, this study introduces a Graph-Augmented RAG framework to dynamically index the expressway knowledge base. To enhance reasoning for expressway incident response strategies, we develop a RL-aligned Chain-of-Thought (RL-CoT) mechanism that enforces consistency between model reasoning and expert problem-solving heuristics for incident handling. Finally, ExpressMind integrates a cross-modal encoder to align the dynamic feature sequences under the visual and textual channels, enabling it to understand traffic scenes in both video and image modalities. Extensive experiments on our newly released multi-modal expressway benchmark demonstrate that ExpressMind comprehensively outperforms existing baselines in event detection, safety response generation, and complex traffic analysis. The code and data are available at: https://wanderhee.github.io/ExpressMind/.

LGMar 1, 2024
A Survey of Geometric Graph Neural Networks: Data Structures, Models and Applications

Jiaqi Han, Jiacheng Cen, Liming Wu et al.

Geometric graphs are a special kind of graph with geometric features, which are vital to model many scientific problems. Unlike generic graphs, geometric graphs often exhibit physical symmetries of translations, rotations, and reflections, making them ineffectively processed by current Graph Neural Networks (GNNs). To address this issue, researchers proposed a variety of geometric GNNs equipped with invariant/equivariant properties to better characterize the geometry and topology of geometric graphs. Given the current progress in this field, it is imperative to conduct a comprehensive survey of data structures, models, and applications related to geometric GNNs. In this paper, based on the necessary but concise mathematical preliminaries, we formalize geometric graph as the data structure, on top of which we provide a unified view of existing models from the geometric message passing perspective. Additionally, we summarize the applications as well as the related datasets to facilitate later research for methodology development and experimental evaluation. We also discuss the challenges and future potential directions of geometric GNNs at the end of this survey.

AIJan 29
Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning

Xixian Yong, Peilin Sun, Zihe Wang et al.

Effective urban planning is crucial for enhancing residents' quality of life and ensuring societal stability, playing a pivotal role in the sustainable development of cities. Current planning methods heavily rely on human experts, which are time-consuming and labor-intensive, or utilize deep learning algorithms, often limiting stakeholder involvement. To bridge these gaps, we propose Intelli-Planner, a novel framework integrating Deep Reinforcement Learning (DRL) with large language models (LLMs) to facilitate participatory and customized planning scheme generation. Intelli-Planner utilizes demographic, geographic data, and planning preferences to determine high-level planning requirements and demands for each functional type. During training, a knowledge enhancement module is employed to enhance the decision-making capability of the policy network. Additionally, we establish a multi-dimensional evaluation system and employ LLM-based stakeholders for satisfaction scoring. Experimental validation across diverse urban settings shows that Intelli-Planner surpasses traditional baselines and achieves comparable performance to state-of-the-art DRL-based methods in objective metrics, while enhancing stakeholder satisfaction and convergence speed. These findings underscore the effectiveness and superiority of our framework, highlighting the potential for integrating the latest advancements in LLMs with DRL approaches to revolutionize tasks related to functional areas planning.

LGOct 15, 2024
Are High-Degree Representations Really Unnecessary in Equivariant Graph Neural Networks?

Jiacheng Cen, Anyi Li, Ning Lin et al.

Equivariant Graph Neural Networks (GNNs) that incorporate E(3) symmetry have achieved significant success in various scientific applications. As one of the most successful models, EGNN leverages a simple scalarization technique to perform equivariant message passing over only Cartesian vectors (i.e., 1st-degree steerable vectors), enjoying greater efficiency and efficacy compared to equivariant GNNs using higher-degree steerable vectors. This success suggests that higher-degree representations might be unnecessary. In this paper, we disprove this hypothesis by exploring the expressivity of equivariant GNNs on symmetric structures, including $k$-fold rotations and regular polyhedra. We theoretically demonstrate that equivariant GNNs will always degenerate to a zero function if the degree of the output representations is fixed to 1 or other specific values. Based on this theoretical insight, we propose HEGNN, a high-degree version of EGNN to increase the expressivity by incorporating high-degree steerable vectors while maintaining EGNN's efficiency through the scalarization trick. Our extensive experiments demonstrate that HEGNN not only aligns with our theoretical analyses on toy datasets consisting of symmetric structures, but also shows substantial improvements on more complicated datasets such as $N$-body and MD17. Our theoretical findings and empirical results potentially open up new possibilities for the research of equivariant GNNs.

GTApr 27
Private Private Information in Second-Price Auction

Boyu Liu, Wei Tang, Zihe Wang et al.

Classic results show that even an arbitrarily small correlation across bidders' information can enable full surplus extraction in auctions and related mechanism design settings. Motivated by this fragility, we study the information independence in a second-price auction when the seller commits to a private private information structure, meaning bidders' signals are independent ex ante, while bidders share a symmetric and arbitrarily correlated prior distribution over their valuations. We first show that the seller optimal efficient outcome with full surplus extraction can always be implemented by a private private information structure that admits a Bayes Nash equilibrium. However, this equilibrium may not be stable. We then further construct a private private information structure that achieves revenue arbitrarily close to maximum welfare while admitting a strict equilibrium. At the same time, we establish an impossibility result: under private private information, in general, bidder surplus cannot achieve maximal welfare exactly, and we characterize necessary and sufficient conditions on the prior distribution under which bidder surplus can be made arbitrarily close to maximal welfare. We then explore which other efficient outcomes are achievable under private private information. Finally, moving beyond private private information, we provide a complete characterization of the achievable pairs (bidder surplus, seller revenue) under general information structures.

CVJun 10, 2025
AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models

Zheda Mai, Arpita Chowdhury, Zihe Wang et al.

The rise of vision foundation models (VFMs) calls for systematic evaluation. A common approach pairs VFMs with large language models (LLMs) as general-purpose heads, followed by evaluation on broad Visual Question Answering (VQA) benchmarks. However, this protocol has two key blind spots: (i) the instruction tuning data may not align with VQA test distributions, meaning a wrong prediction can stem from such data mismatch rather than a VFM' visual shortcomings; (ii) VQA benchmarks often require multiple visual abilities, making it hard to tell whether errors stem from lacking all required abilities or just a single critical one. To address these gaps, we introduce AVA-Bench, the first benchmark that explicitly disentangles 14 Atomic Visual Abilities (AVAs) -- foundational skills like localization, depth estimation, and spatial understanding that collectively support complex visual reasoning tasks. By decoupling AVAs and matching training and test distributions within each, AVA-Bench pinpoints exactly where a VFM excels or falters. Applying AVA-Bench to leading VFMs thus reveals distinctive "ability fingerprints," turning VFM selection from educated guesswork into principled engineering. Notably, we find that a 0.5B LLM yields similar VFM rankings as a 7B LLM while cutting GPU hours by 8x, enabling more efficient evaluation. By offering a comprehensive and transparent benchmark, we hope AVA-Bench lays the foundation for the next generation of VFMs.

LGOct 15, 2025
Universally Invariant Learning in Equivariant GNNs

Jiacheng Cen, Anyi Li, Ning Lin et al.

Equivariant Graph Neural Networks (GNNs) have demonstrated significant success across various applications. To achieve completeness -- that is, the universal approximation property over the space of equivariant functions -- the network must effectively capture the intricate multi-body interactions among different nodes. Prior methods attain this via deeper architectures, augmented body orders, or increased degrees of steerable features, often at high computational cost and without polynomial-time solutions. In this work, we present a theoretically grounded framework for constructing complete equivariant GNNs that is both efficient and practical. We prove that a complete equivariant GNN can be achieved through two key components: 1) a complete scalar function, referred to as the canonical form of the geometric graph; and 2) a full-rank steerable basis set. Leveraging this finding, we propose an efficient algorithm for constructing complete equivariant GNNs based on two common models: EGNN and TFN. Empirical results demonstrate that our model demonstrates superior completeness and excellent performance with only a few layers, thereby significantly reducing computational overhead while maintaining strong practical efficacy.

AIAug 16, 2025
CHBench: A Cognitive Hierarchy Benchmark for Evaluating Strategic Reasoning Capability of LLMs

Hongtao Liu, Zhicheng Du, Zihe Wang et al.

Game-playing ability serves as an indicator for evaluating the strategic reasoning capability of large language models (LLMs). While most existing studies rely on utility performance metrics, which are not robust enough due to variations in opponent behavior and game structure. To address this limitation, we propose \textbf{Cognitive Hierarchy Benchmark (CHBench)}, a novel evaluation framework inspired by the cognitive hierarchy models from behavioral economics. We hypothesize that agents have bounded rationality -- different agents behave at varying reasoning depths/levels. We evaluate LLMs' strategic reasoning through a three-phase systematic framework, utilizing behavioral data from six state-of-the-art LLMs across fifteen carefully selected normal-form games. Experiments show that LLMs exhibit consistent strategic reasoning levels across diverse opponents, confirming the framework's robustness and generalization capability. We also analyze the effects of two key mechanisms (Chat Mechanism and Memory Mechanism) on strategic reasoning performance. Results indicate that the Chat Mechanism significantly degrades strategic reasoning, whereas the Memory Mechanism enhances it. These insights position CHBench as a promising tool for evaluating LLM capabilities, with significant potential for future research and practical applications.

CVJan 12, 2025
Static Segmentation by Tracking: A Label-Efficient Approach for Fine-Grained Specimen Image Segmentation

Zhenyang Feng, Zihe Wang, Jianyang Gu et al.

We study image segmentation in the biological domain, particularly trait segmentation from specimen images (e.g., butterfly wing stripes, beetle elytra). This fine-grained task is crucial for understanding the biology of organisms, but it traditionally requires manually annotating segmentation masks for hundreds of images per species, making it highly labor-intensive. To address this challenge, we propose a label-efficient approach, Static Segmentation by Tracking (SST), based on a key insight: while specimens of the same species exhibit natural variation, the traits of interest show up consistently. This motivates us to concatenate specimen images into a ``pseudo-video'' and reframe trait segmentation as a tracking problem. Specifically, SST generates masks for unlabeled images by propagating annotated or predicted masks from the ``pseudo-preceding'' images. Built upon recent video segmentation models, such as Segment Anything Model 2, SST achieves high-quality trait segmentation with only one labeled image per species, marking a breakthrough in specimen image analysis. To further enhance segmentation quality, we introduce a cycle-consistent loss for fine-tuning, again requiring only one labeled image. Additionally, we demonstrate the broader potential of SST, including one-shot instance segmentation in natural images and trait-based image retrieval.

CRAug 28, 2021
Identifying Ransomware Actors in the Bitcoin Network

Siddhartha Dalal, Zihe Wang, Siddhanth Sabharwal

Due to the pseudo-anonymity of the Bitcoin network, users can hide behind their bitcoin addresses that can be generated in unlimited quantity, on the fly, without any formal links between them. Thus, it is being used for payment transfer by the actors involved in ransomware and other illegal activities. The other activity we consider is related to gambling since gambling is often used for transferring illegal funds. The question addressed here is that given temporally limited graphs of Bitcoin transactions, to what extent can one identify common patterns associated with these fraudulent activities and apply them to find other ransomware actors. The problem is rather complex, given that thousands of addresses can belong to the same actor without any obvious links between them and any common pattern of behavior. The main contribution of this paper is to introduce and apply new algorithms for local clustering and supervised graph machine learning for identifying malicious actors. We show that very local subgraphs of the known such actors are sufficient to differentiate between ransomware, random and gambling actors with 85% prediction accuracy on the test data set.

QUANT-PHJan 28, 2021
Practical distributed quantum information processing with LOCCNet

Xuanqiang Zhao, Benchi Zhao, Zihe Wang et al.

Distributed quantum information processing is essential for building quantum networks and enabling more extensive quantum computations. In this regime, several spatially separated parties share a multipartite quantum system, and the most natural set of operations is Local Operations and Classical Communication (LOCC). As a pivotal part in quantum information theory and practice, LOCC has led to many vital protocols such as quantum teleportation. However, designing practical LOCC protocols is challenging due to LOCC's intractable structure and limitations set by near-term quantum devices. Here we introduce LOCCNet, a machine learning framework facilitating protocol design and optimization for distributed quantum information processing tasks. As applications, we explore various quantum information tasks such as entanglement distillation, quantum state discrimination, and quantum channel simulation. We discover protocols with evident improvements, in particular, for entanglement distillation with quantum states of interest in quantum information. Our approach opens up new opportunities for exploring entanglement and its applications with machine learning, which will potentially sharpen our understanding of the power and limitations of LOCC. An implementation of LOCCNet is available in Paddle Quantum, a quantum machine learning Python package based on PaddlePaddle deep learning platform.

GTJan 28, 2020
Bounded Incentives in Manipulating the Probabilistic Serial Rule

Zihe Wang, Zhide Wei, Jie Zhang

The Probabilistic Serial mechanism is well-known for its desirable fairness and efficiency properties. It is one of the most prominent protocols for the random assignment problem. However, Probabilistic Serial is not incentive-compatible, thereby these desirable properties only hold for the agents' declared preferences, rather than their genuine preferences. A substantial utility gain through strategic behaviors would trigger self-interested agents to manipulate the mechanism and would subvert the very foundation of adopting the mechanism in practice. In this paper, we characterize the extent to which an individual agent can increase its utility by strategic manipulation. We show that the incentive ratio of the mechanism is $\frac{3}{2}$. That is, no agent can misreport its preferences such that its utility becomes more than 1.5 times of what it is when reports truthfully. This ratio is a worst-case guarantee by allowing an agent to have complete information about other agents' reports and to figure out the best response strategy even if it is computationally intractable in general. To complement this worst-case study, we further evaluate an agent's utility gain on average by experiments. The experiments show that an agent' incentive in manipulating the rule is very limited. These results shed some light on the robustness of Probabilistic Serial against strategic manipulation, which is one step further than knowing that it is not incentive-compatible.