Dapeng Jiang

LG
h-index12
4papers
4citations
Novelty61%
AI Score53

4 Papers

AIApr 14
Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization

Yizhe Chi, Deyao Hong, Dapeng Jiang et al.

Current LLM agent benchmarks, which predominantly focus on binary pass/fail tasks such as code generation or search-based question answering, often neglect the value of real-world engineering that is often captured through the iterative optimization of feasible designs. To this end, we introduce Frontier-Eng, a human-verified benchmark for generative optimization -- an iterative propose-execute-evaluate loop in which an agent generates candidate artifacts, receives executable verifier feedback, and revises them under a fixed interaction budget -- spanning $47$ tasks across five broad engineering categories. Unlike previous suites, Frontier-Eng tasks are grounded in industrial-grade simulators and verifiers that provide continuous reward signals and enforce hard feasibility constraints under constrained budgets. We evaluate eight frontier language models using representative search frameworks, finding that while Claude 4.6 Opus achieves the most robust performance, the benchmark remains challenging for all models. Our analysis suggests a dual power-law decay in improvement frequency ($\sim$ 1/iteration) and magnitude ($\sim$ 1/improvement count). We further show that although width improves parallelism and diversity, depth remains crucial for hard-won improvements under a fixed budget. Frontier-Eng establishes a new standard for assessing the capacity of AI agents to integrate domain knowledge with executable feedback to solve complex, open-ended engineering problems.

SYMay 13
Port-Hamiltonian Systems with Dissipation Potential: Modelling and Trajectory Tracking Control

Jinjun Jia, Yuchen Liao, Kang An et al.

Port-Hamiltonian systems (PHS) and interconnection and damping assignment passivity-based control (IDA-PBC) have achieved broad success in modelling and stabilisation of physical systems. However, the absence of a dedicated scalar potential for the momentum channel forces any modification of the momentum-dependent dynamics to proceed indirectly through the interconnection and damping matrices, rendering the matching partial differential equation (PDE) difficult to solve and complicating extensions to trajectory tracking. This paper proposes a port-Hamiltonian system with dissipation potential (PHS-DP), in which the damping matrix is replaced by scalar convex dissipation potentials, providing independent scalar objects for the momentum and auxiliary state channels and restoring the variational symmetry between stored and dissipated energy. Building on this framework, Dual Potential Shaping Control (DPSC) achieves trajectory tracking by sequentially shaping the potential energy and dissipation potentials without modifying the interconnection structure. Contraction of the closed-loop cascade is established via a hierarchical contraction argument, and the matching condition is satisfied automatically for any admissible choice of shaped potentials, requiring no PDE to be solved. In contrast to existing PDE-free energy shaping approaches, which achieve this by abandoning the port-Hamiltonian closed-loop structure and sacrificing physical interpretability, the proposed framework preserves the interconnection structure and retains a transparent energy-based interpretation at every stage of the design. Validation on a magnetic levitation system demonstrates tracking performance comparable to timed IDA-PBC with substantially reduced design complexity.

LGMay 9
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

Bohan Lyu, Yucheng Yang, Siqiao Huang et al.

Modern AI progress has been driven by ML methods that are generalizable across settings and scalable to larger regimes. As large language models demonstrate advanced capabilities in reasoning, coding, and engineering tasks, it is increasingly important to understand whether they can discover such methods rather than only apply existing ones. We introduce MLS-Bench, a benchmark for evaluating whether AI systems can invent generalizable and scalable ML methods. MLS-Bench contains 140 tasks across 12 domains, each requiring an agent to improve one targeted component of an ML system or algorithm and demonstrate that the improvement generalizes across controlled settings and scales. We find that current agents remain far from reliably surpassing human-designed methods, and that engineering-style tuning is easier for them than genuine method invention. We further study the effects of test-time scaling, adaptive compute allocation, and context provision on agents' discovery performance, together with case studies of their behavior. Our analyses suggest that the bottleneck is not only in proposing new methods, but also in the scientific insight needed to plan, validate, and scale claims about them. More search, compute, or context alone does not remove this bottleneck. We build and maintain a community platform for cumulative and comparable iteration, and release the data and code at https://mls-bench.com.

LGJul 6, 2025
Zero-Shot Cyclic Peptide Design via Composable Geometric Constraints

Dapeng Jiang, Xiangzhe Kong, Jiaqi Han et al.

Cyclic peptides, characterized by geometric constraints absent in linear peptides, offer enhanced biochemical properties, presenting new opportunities to address unmet medical needs. However, designing target-specific cyclic peptides remains underexplored due to limited training data. To bridge the gap, we propose CP-Composer, a novel generative framework that enables zero-shot cyclic peptide generation via composable geometric constraints. Our approach decomposes complex cyclization patterns into unit constraints, which are incorporated into a diffusion model through geometric conditioning on nodes and edges. During training, the model learns from unit constraints and their random combinations in linear peptides, while at inference, novel constraint combinations required for cyclization are imposed as input. Experiments show that our model, despite trained with linear peptides, is capable of generating diverse target-binding cyclic peptides, reaching success rates from 38% to 84% on different cyclization strategies.