Shiyu Chen

AI
h-index34
10papers
150citations
Novelty45%
AI Score54

10 Papers

AIMay 27
COOP$^2$: Defining, Observing, and Repairing Cooperation in LLM Multi-Agent Systems

Hanqing Yang, Narjes Nourzad, Shiyu Chen et al.

Many complex tasks require extended effort, diverse capabilities, or coordinated actions beyond what a single agent can provide. However, simply adding more agents does not guarantee better performance, as effective cooperation depends on how agents interact with each other and with task structure to satisfy evolving constraints over time. This challenge is amplified for LLM-based multi-agent systems (LLM-MAS): plans, messages, and revisions occur in natural language, whereas task progress depends on grounded environment actions. Current evaluations mostly treat cooperation as an implicit ingredient of final task success, leaving both cooperation and the effect of multi-agent interaction on task dynamics difficult to study. We introduce COOP$^2$, an evaluation framework that grounds high-level agent cooperation dynamics in LLM-MAS within task progress in the environment. COOP$^2$ then defines cooperative tasks with verifiable cooperative requirements, allowing us to analyze how cooperation unfolds over time with respect to task progress, as well as where and why cooperation breaks down. Building on this framework, we develop COOP$^2$-Repair, which predicts constraint failures from group plans and opens targeted repair channels for guided revisions. Across two environments and three communication structures, COOP$^2$-Repair improves task success and constraint satisfaction while exposing the additional decision overhead and communication load required for repair. The project web page can be found at: https://happyeureka.github.io/coop2.

CVSep 13, 2022
CPnP: Consistent Pose Estimator for Perspective-n-Point Problem with Bias Elimination

Guangyang Zeng, Shiyu Chen, Biqiang Mu et al.

The Perspective-n-Point (PnP) problem has been widely studied in both computer vision and photogrammetry societies. With the development of feature extraction techniques, a large number of feature points might be available in a single shot. It is promising to devise a consistent estimator, i.e., the estimate can converge to the true camera pose as the number of points increases. To this end, we propose a consistent PnP solver, named \emph{CPnP}, with bias elimination. Specifically, linear equations are constructed from the original projection model via measurement model modification and variable elimination, based on which a closed-form least-squares solution is obtained. We then analyze and subtract the asymptotic bias of this solution, resulting in a consistent estimate. Additionally, Gauss-Newton (GN) iterations are executed to refine the consistent solution. Our proposed estimator is efficient in terms of computations -- it has $O(n)$ computational complexity. Experimental tests on both synthetic data and real images show that our proposed estimator is superior to some well-known ones for images with dense visual features, in terms of estimation precision and computing time.

IRMay 27
Do Agents Need Semantic Metadata? A Comparative Study in Agentic Data Retrieval

Shiyu Chen, Tarfah Alrashed, Alon Halevy et al.

In the era of autonomous agents, machine-actionable data is critical for data-driven workflows. For more than a decade, semantic metadata like schema.org has anchored the FAIR principles (Findable, Accessible, Interoperable, and Reusable) for machine-actionable data and enabled discovery tools like Google Dataset Search. However, the rise of Large Language Models (LLMs) capable of navigating the unstructured web raises a fundamental question: Is semantic metadata still necessary for agentic data discovery, or can agents reliably retrieve actionable data directly from the web? We present a comparative analysis of agentic data retrieval across two distinct environments: a Baseline Agent searching billions of open-web documents, and a Semantic Agent leveraging a corpus of 90 million datasets using schema.org. We deploy an "LLM-as-a-judge" evaluation pipeline, mapped directly to the FAIR principles, to assess the semantic relevance, data accessibility, and computational utility of the retrieved data. Our results reveal a clear divergence. The Semantic Agent excels at retrieving actionable data, achieving a 44.9% higher precision for metadata-rich registries and a 46.6% higher precision for pages with machine-readable downloads among its returned results. Conversely, the Baseline Agent frequently suffers "Last-Mile Utility" failures, retrieving prose-heavy pages (20.1% of results) and portal landing pages (8.5%) rather than actual data pages. While the Baseline Agent achieves higher coverage by answering 40% more questions, the Semantic Agent delivers greater accuracy, achieving 65.7% higher overall precision in retrieving FAIR-compliant datasets. We conclude that while unstructured retrieval supports broad exploratory tasks, structured ecosystems remain the indispensable foundation for reliable, execution-oriented autonomous workflows.

AINov 6, 2025
DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration

Narjes Nourzad, Hanqing Yang, Shiyu Chen et al.

Cooperative multi-agent planning requires agents to make joint decisions with partial information and limited communication. Coordination at the trajectory level often fails, as small deviations in timing or movement cascade into conflicts. Symbolic planning mitigates this challenge by raising the level of abstraction and providing a minimal vocabulary of actions that enable synchronization and collective progress. We present DR. WELL, a decentralized neurosymbolic framework for cooperative multi-agent planning. Cooperation unfolds through a two-phase negotiation protocol: agents first propose candidate roles with reasoning and then commit to a joint allocation under consensus and environment constraints. After commitment, each agent independently generates and executes a symbolic plan for its role without revealing detailed trajectories. Plans are grounded in execution outcomes via a shared world model that encodes the current state and is updated as agents act. By reasoning over symbolic plans rather than raw trajectories, DR. WELL avoids brittle step-level alignment and enables higher-level operations that are reusable, synchronizable, and interpretable. Experiments on cooperative block-push tasks show that agents adapt across episodes, with the dynamic world model capturing reusable patterns and improving task completion rates and efficiency. Experiments on cooperative block-push tasks show that our dynamic world model improves task completion and efficiency through negotiation and self-refinement, trading a time overhead for evolving, more efficient collaboration strategies.

LGJul 15, 2025
Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning

Shiyu Chen, Cencheng Shen, Youngser Park et al.

Graph neural networks (GNNs) have emerged as a powerful framework for a wide range of node-level graph learning tasks. However, their performance is often constrained by reliance on random or minimally informed initial feature representations, which can lead to slow convergence and suboptimal solutions. In this paper, we leverage a statistically grounded method, one-hot graph encoder embedding (GEE), to generate high-quality initial node features that enhance the end-to-end training of GNNs. We refer to this integrated framework as the GEE-powered GNN (GG), and demonstrate its effectiveness through extensive simulations and real-world experiments across both unsupervised and supervised settings. In node clustering, GG consistently achieves state-of-the-art performance, ranking first across all evaluated real-world datasets, while exhibiting faster convergence compared to the standard GNN. For node classification, we further propose an enhanced variant, GG-C, which concatenates the outputs of GG and GEE and outperforms competing baselines. These results confirm the importance of principled, structure-aware feature initialization in realizing the full potential of GNNs.

LGOct 11, 2025
Multi-View Graph Learning with Graph-Tuple

Shiyu Chen, Ningyuan Huang, Soledad Villar

Graph Neural Networks (GNNs) typically scale with the number of graph edges, making them well suited for sparse graphs but less efficient on dense graphs, such as point clouds or molecular interactions. A common remedy is to sparsify the graph via similarity thresholding or distance pruning, but this forces an arbitrary choice of a single interaction scale and discards crucial information from other scales. To overcome this limitation, we introduce a multi-view graph-tuple framework. Instead of a single graph, our graph-tuple framework partitions the graph into disjoint subgraphs, capturing primary local interactions and weaker, long-range connections. We then learn multi-view representations from the graph-tuple via a heterogeneous message-passing architecture inspired by the theory of non-commuting operators, which we formally prove is strictly more expressive and guarantees a lower oracle risk compared to single-graph message-passing models. We instantiate our framework on two scientific domains: molecular property prediction from feature-scarce Coulomb matrices and cosmological parameter inference from geometric point clouds. On both applications, our multi-view graph-tuple models demonstrate better performance than single-graph baselines, highlighting the power and versatility of our multi-view approach.

IRJun 12, 2020
Google Dataset Search by the Numbers

Omar Benjelloun, Shiyu Chen, Natasha Noy

Scientists, governments, and companies increasingly publish datasets on the Web. Google's Dataset Search extracts dataset metadata -- expressed using schema.org and similar vocabularies -- from Web pages in order to make datasets discoverable. Since we started the work on Dataset Search in 2016, the number of datasets described in schema.org has grown from about 500K to almost 30M. Thus, this corpus has become a valuable snapshot of data on the Web. To the best of our knowledge, this corpus is the largest and most diverse of its kind. We analyze this corpus and discuss where the datasets originate from, what topics they cover, which form they take, and what people searching for datasets are interested in. Based on this analysis, we identify gaps and possible future work to help make data more discoverable.

CVMar 29, 2017
Learning with Privileged Information for Multi-Label Classification

Shiyu Chen, Shangfei Wang, Tanfang Chen et al.

In this paper, we propose a novel approach for learning multi-label classifiers with the help of privileged information. Specifically, we use similarity constraints to capture the relationship between available information and privileged information, and use ranking constraints to capture the dependencies among multiple labels. By integrating similarity constraints and ranking constraints into the learning process of classifiers, the privileged information and the dependencies among multiple labels are exploited to construct better classifiers during training. A maximum margin classifier is adopted, and an efficient learning algorithm of the proposed method is also developed. We evaluate the proposed method on two applications: multiple object recognition from images with the help of implicit information about object importance conveyed by the list of manually annotated image tags; and multiple facial action unit detection from low-resolution images augmented by high-resolution images. Experimental results demonstrate that the proposed method can effectively take full advantage of privileged information and dependencies among multiple labels for better object recognition and better facial action unit detection.

AINov 17, 2013
A Visibility Graph Averaging Aggregation Operator

Shiyu Chen, Yong Hu, Sankaran Mahadevan et al.

The problem of aggregation is considerable importance in many disciplines. In this paper, a new type of operator called visibility graph averaging (VGA) aggregation operator is proposed. This proposed operator is based on the visibility graph which can convert a time series into a graph. The weights are obtained according to the importance of the data in the visibility graph. Finally, the VGA operator is used in the analysis of the TAIEX database to illustrate that it is practical and compared with the classic aggregation operators, it shows its advantage that it not only implements the aggregation of the data purely, but also conserves the time information, and meanwhile, the determination of the weights is more reasonable.

AIOct 28, 2013
Ranking basic belief assignments in decision making under uncertain environment

Yuxian Du, Shiyu Chen, Yong Hu et al.

Dempster-Shafer theory (D-S theory) is widely used in decision making under the uncertain environment. Ranking basic belief assignments (BBAs) now is an open issue. Existing evidence distance measures cannot rank the BBAs in the situations when the propositions have their own ranking order or their inherent measure of closeness. To address this issue, a new ranking evidence distance (RED) measure is proposed. Compared with the existing evidence distance measures including the Jousselme's distance and the distance between betting commitments, the proposed RED measure is much more general due to the fact that the order of the propositions in the systems is taken into consideration. If there is no order or no inherent measure of closeness in the propositions, our proposed RED measure is reduced to the existing evidence distance. Numerical examples show that the proposed RED measure is an efficient alternative to rank BBAs in decision making under uncertain environment.