Jianwei Hu

10.4AIJun 10

Embodied-BenchClaw: An Autonomous Multi-Agent System for Embodied Spatial Intelligence Benchmark Construction

Baoyang Jiang, Fengchun Zhang, Leyuan Wang et al.

Benchmarks are essential for evaluating embodied spatial intelligence, yet their construction is labor-intensive, hard to reuse, and difficult to maintain. Existing embodied benchmarks are often static and may quickly become saturated as models improve, limiting their ability to distinguish new capabilities. We propose Embodied-BenchClaw, an autonomous agentic system for constructing embodied spatial intelligence benchmarks. Given a user-specified evaluation intent, Embodied-BenchClaw automatically produces a complete and continually updatable benchmark package through a five-stage pipeline: intent blueprinting, data collection, structuring and cleaning, benchmark synthesis, and evaluation reporting. The pipeline is coordinated by three agents for planning, construction, and evaluation. To improve reusability and reliability, Embodied-BenchClaw introduces an extensible Skill Library and process quality control, enabling benchmark construction to be composable, verifiable, and repairable. We instantiate multiple benchmarks covering indoor spatial reasoning, outdoor spatial reasoning, robotic manipulation, quadruped robot navigation, UAV/aerial-view understanding, and static benchmark enhancement. These benchmarks span diverse embodied carriers, data sources, and spatial capabilities. Experiments with human evaluation, judge-based assessment, consistency checks, cost analysis, and ablations show that Embodied-BenchClaw can construct verifiable, executable, maintainable, and diagnostically useful embodied spatial benchmarks with reduced manual effort.

2.2CLDec 28, 2015

Communicating with sentences: A multi-word naming game model

Yang Lou, Guanrong Chen, Jianwei Hu

Naming game simulates the process of naming an object by a single word, in which a population of communicating agents can reach global consensus asymptotically through iteratively pair-wise conversations. We propose an extension of the single-word model to a multi-word naming game (MWNG), simulating the case of describing a complex object by a sentence (multiple words). Words are defined in categories, and then organized as sentences by combining them from different categories. We refer to a formatted combination of several words as a pattern. In such an MWNG, through a pair-wise conversation, it requires the hearer to achieve consensus with the speaker with respect to both every single word in the sentence as well as the sentence pattern, so as to guarantee the correct meaning of the saying, otherwise, they fail reaching consensus in the interaction. We validate the model in three typical topologies as the underlying communication network, and employ both conventional and man-designed patterns in performing the MWNG.

Jianwei Hu

2 Papers