Stefan Menzel

h-index72

16papers

170citations

Novelty52%

AI Score52

Ranked #33,595 of 201,018 authors (top 17%)#1,556 in AI (top 11%)

16 Papers

AIJun 3

R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search

João Pedro Gandarela, Thiago Rios, Stefan Menzel et al.

Large language models (LLMs) are fluent on open-ended tasks, yet in agentic settings, where a system must plan, use tools, and act over extended horizons, fluency does not ensure reliable delivery. We trace this gap to three coupled structural failures: errors propagate without localization, worst-case perturbations go unevaluated, and accumulated knowledge is never invalidated. We argue these share a root cause: abductive, counterfactual, meta-inductive, corrective, and inductive reasoning pull a shared context in incompatible directions. We introduce Reflective Adversarial Pareto Search (R-APS), to our knowledge the first method addressing all three failures jointly via reasoning-mode decomposition, allocating each reasoning mode its own context and orchestrating interaction across three timescales: staged compositional reasoning with a typed validation critic (failure localization), sensitivity-guided counterfactual stress-testing as a first-class Pareto objective (robustness), and meta-inductive rule extraction with explicit invalidation (persistent memory). R-APS requires no fine-tuning and operates on a frozen LLM purely via structured protocol design. We evaluate on planar mechanism synthesis (robotics, prosthetics, mechanical design), with every candidate checked by a kinematic solver. On 32 target trajectories, R-APS delivers robustness certificates 3.5x tighter than uniform-perturbation baselines, 46% faster iterations-to-first-admission, and 2.1x Chamfer-distance reduction over Enum+GA while jointly controlling bar-count and worst-case robustness. Small 4B reasoning-specialized models prove competitive with general-purpose 70B backbones inside the protocol, suggesting structured protocols can partially offset model scale.

CLJul 3, 2023

Large Language and Text-to-3D Models for Engineering Design Optimization

Thiago Rios, Stefan Menzel, Bernhard Sendhoff

The current advances in generative AI for learning large neural network models with the capability to produce essays, images, music and even 3D assets from text prompts create opportunities for a manifold of disciplines. In the present paper, we study the potential of deep text-to-3D models in the engineering domain, with focus on the chances and challenges when integrating and interacting with 3D assets in computational simulation-based design optimization. In contrast to traditional design optimization of 3D geometries that often searches for the optimum designs using numerical representations, such as B-Spline surface or deformation parameters in vehicle aerodynamic optimization, natural language challenges the optimization framework by requiring a different interpretation of variation operators while at the same time may ease and motivate the human user interaction. Here, we propose and realize a fully automated evolutionary design optimization framework using Shap-E, a recently published text-to-3D asset network by OpenAI, in the context of aerodynamic vehicle optimization. For representing text prompts in the evolutionary optimization, we evaluate (a) a bag-of-words approach based on prompt templates and Wordnet samples, and (b) a tokenisation approach based on prompt templates and the byte pair encoding method from GPT4. Our main findings from the optimizations indicate that, first, it is important to ensure that the designs generated from prompts are within the object class of application, i.e. diverse and novel designs need to be realistic, and, second, that more research is required to develop methods where the strength of text prompt variations and the resulting variations of the 3D designs share causal relations to some degree to improve the optimization.

AIApr 30Code

Language Models Refine Mechanical Linkage Designs Through Symbolic Reflection and Modular Optimisation

João Pedro Gandarela, Thiago Rios, Stefan Menzel et al.

Designing mechanical linkages involves combinatorial topology selection and continuous parameter fitting. We show that language models can systematically improve linkage designs through symbolic representations. Language model agents explore discrete topologies while numerical optimisers fit continuous parameters. A symbolic lifting operator translates simulator trajectories into qualitative descriptors, motion labels, temporal predicates, and structural diagnostics that models interpret across iterative design cycles. Across six engineering-relevant motion targets and three open-source models (Llama 3.3 70B, Qwen3 4B, Qwen3 MoE 30B-A3B), the modular architecture reduces geometric error by up to 68% and improves structural validity by up to 134% over monolithic baselines. Critically, 78.6% of iterative refinement trajectories show measurable improvement, with the system correctly diagnosing overconstraint (56.3%) and underconstraint (35.6%) failure modes and proposing grounded corrections. Models across all three families acquire interpretable mechanical reasoning strategies without fine-tuning, demonstrating that principled symbolic abstraction bridges generative AI and the numerical precision required for engineering design.

AIMar 6, 2025

From Idea to CAD: A Language Model-Driven Multi-Agent System for Collaborative Design

Felix Ocker, Stefan Menzel, Ahmed Sadik et al.

Creating digital models using Computer Aided Design (CAD) is a process that requires in-depth expertise. In industrial product development, this process typically involves entire teams of engineers, spanning requirements engineering, CAD itself, and quality assurance. We present an approach that mirrors this team structure with a Vision Language Model (VLM)-based Multi Agent System, with access to parametric CAD tooling and tool documentation. Combining agents for requirements engineering, CAD engineering, and vision-based quality assurance, a model is generated automatically from sketches and/ or textual descriptions. The resulting model can be refined collaboratively in an iterative validation loop with the user. Our approach has the potential to increase the effectiveness of design processes, both for industry experts and for hobbyists who create models for 3D printing. We demonstrate the potential of the architecture at the example of various design tasks and provide several ablations that show the benefits of the architecture's individual components.

CVJun 11, 2025

LLM-to-Phy3D: Physically Conform Online 3D Object Generation with LLMs

Melvin Wong, Yueming Lyu, Thiago Rios et al.

The emergence of generative artificial intelligence (GenAI) and large language models (LLMs) has revolutionized the landscape of digital content creation in different modalities. However, its potential use in Physical AI for engineering design, where the production of physically viable artifacts is paramount, remains vastly underexplored. The absence of physical knowledge in existing LLM-to-3D models often results in outputs detached from real-world physical constraints. To address this gap, we introduce LLM-to-Phy3D, a physically conform online 3D object generation that enables existing LLM-to-3D models to produce physically conforming 3D objects on the fly. LLM-to-Phy3D introduces a novel online black-box refinement loop that empowers large language models (LLMs) through synergistic visual and physics-based evaluations. By delivering directional feedback in an iterative refinement process, LLM-to-Phy3D actively drives the discovery of prompts that yield 3D artifacts with enhanced physical performance and greater geometric novelty relative to reference objects, marking a substantial contribution to AI-driven generative design. Systematic evaluations of LLM-to-Phy3D, supported by ablation studies in vehicle design optimization, reveal various LLM improvements gained by 4.5% to 106.7% in producing physically conform target domain 3D designs over conventional LLM-to-3D models. The encouraging results suggest the potential general use of LLM-to-Phy3D in Physical AI for scientific and engineering applications.

NEOct 4, 2025

Evolutionary Computation as Natural Generative AI

Yaxin Shi, Abhishek Gupta, Ying Wu et al.

Generative AI (GenAI) has achieved remarkable success across a range of domains, but its capabilities remain constrained to statistical models of finite training sets and learning based on local gradient signals. This often results in artifacts that are more derivative than genuinely generative. In contrast, Evolutionary Computation (EC) offers a search-driven pathway to greater diversity and creativity, expanding generative capabilities by exploring uncharted solution spaces beyond the limits of available data. This work establishes a fundamental connection between EC and GenAI, redefining EC as Natural Generative AI (NatGenAI) -- a generative paradigm governed by exploratory search under natural selection. We demonstrate that classical EC with parent-centric operators mirrors conventional GenAI, while disruptive operators enable structured evolutionary leaps, often within just a few generations, to generate out-of-distribution artifacts. Moreover, the methods of evolutionary multitasking provide an unparalleled means of integrating disruptive EC (with cross-domain recombination of evolved features) and moderated selection mechanisms (allowing novel solutions to survive), thereby fostering sustained innovation. By reframing EC as NatGenAI, we emphasize structured disruption and selection pressure moderation as essential drivers of creativity. This perspective extends the generative paradigm beyond conventional boundaries and positions EC as crucial to advancing exploratory design, innovation, scientific discovery, and open-ended generation in the GenAI era.

AIMay 23, 2025

Controlled Agentic Planning & Reasoning for Mechanism Synthesis

João Pedro Gandarela, Thiago Rios, Stefan Menzel et al.

This work presents a dual-agent \ac{llm}-based reasoning framework for automated planar mechanism synthesis that tightly couples linguistic specification with symbolic representation and simulation. From a natural-language task description, the system composes symbolic constraints and equations, generates and parametrises simulation code, and iteratively refines designs via critic-driven feedback, including symbolic regression and geometric distance metrics, closing an actionable linguistic/symbolic optimisation loop. To evaluate the approach, we introduce MSynth, a benchmark of analytically defined planar trajectories. Empirically, critic feedback and iterative refinement yield large improvements (up to 90\% on individual tasks) and statistically significant gains per the Wilcoxon signed-rank test. Symbolic-regression prompts provide deeper mechanistic insight primarily when paired with larger models or architectures with appropriate inductive biases (e.g., LRM).

AIJun 21, 2024

LLM2TEA: An Agentic AI Designer for Discovery with Generative Evolutionary Multitasking

Melvin Wong, Jiao Liu, Thiago Rios et al.

This paper presents LLM2TEA, a Large Language Model (LLM) driven MultiTask Evolutionary Algorithm, representing the first agentic AI designer of its kind operating with generative evolutionary multitasking (GEM). LLM2TEA enables the crossbreeding of solutions from multiple domains, fostering novel solutions that transcend disciplinary boundaries. Of particular interest is the ability to discover designs that are both novel and conforming to real-world physical specifications. LLM2TEA comprises an LLM to generate genotype samples from text prompts describing target objects, a text-to-3D generative model to produce corresponding phenotypes, a classifier to interpret its semantic representations, and a computational simulator to assess its physical properties. Novel LLM-based multitask evolutionary operators are introduced to guide the search towards high-performing, practically viable designs. Experimental results in conceptual design optimization validate the effectiveness of LLM2TEA, showing 97% to 174% improvements in the diversity of novel designs over the current text-to-3D baseline. Moreover, over 73% of the generated designs outperform the top 1% of designs produced by the text-to-3D baseline in terms of physical performance. The designs produced by LLM2TEA are not only aesthetically creative but also functional in real-world contexts. Several of these designs have been successfully 3D printed, demonstrating the ability of our approach to transform AI-generated outputs into tangible, physical designs. These designs underscore the potential of LLM2TEA as a powerful tool for complex design optimization and discovery, capable of producing novel and physically viable designs.

AIJun 13, 2024

Generative AI-based Prompt Evolution Engineering Design Optimization With Vision-Language Model

Melvin Wong, Thiago Rios, Stefan Menzel et al.

Engineering design optimization requires an efficient combination of a 3D shape representation, an optimization algorithm, and a design performance evaluation method, which is often computationally expensive. We present a prompt evolution design optimization (PEDO) framework contextualized in a vehicle design scenario that leverages a vision-language model for penalizing impractical car designs synthesized by a generative model. The backbone of our framework is an evolutionary strategy coupled with an optimization objective function that comprises a physics-based solver and a vision-language model for practical or functional guidance in the generated car designs. In the prompt evolutionary search, the optimizer iteratively generates a population of text prompts, which embed user specifications on the aerodynamic performance and visual preferences of the 3D car designs. Then, in addition to the computational fluid dynamics simulations, the pre-trained vision-language model is used to penalize impractical designs and, thus, foster the evolutionary algorithm to seek more viable designs. Our investigations on a car design optimization problem show a wide spread of potential car designs generated at the early phase of the search, which indicates a good diversity of designs in the initial populations, and an increase of over 20\% in the probability of generating practical designs compared to a baseline framework without using a vision-language model. Visual inspection of the designs against the performance results demonstrates prompt evolution as a very promising paradigm for finding novel designs with good optimization performance while providing ease of use in specifying design specifications and preferences via a natural language interface.

NEApr 14, 2021

A Novel Generalised Meta-Heuristic Framework for Dynamic Capacitated Arc Routing Problems

Hao Tong, Leandro L. Minku, Stefan Menzel et al.

The capacitated arc routing problem (CARP) is a challenging combinatorial optimisation problem abstracted from many real-world applications, such as waste collection, road gritting and mail delivery. However, few studies considered dynamic changes during the vehicles' service, which can cause the original schedule infeasible or obsolete. The few existing studies are limited by the dynamic scenarios considered, and by overly complicated algorithms that are unable to benefit from the wealth of contributions provided by the existing CARP literature. In this paper, we first provide a mathematical formulation of dynamic CARP (DCARP) and design a simulation system that is able to consider dynamic events while a routing solution is already partially executed. We then propose a novel framework which can benefit from existing static CARP optimisation algorithms so that they could be used to handle DCARP instances. The framework is very flexible. In response to a dynamic event, it can use either a simple restart strategy or a sequence transfer strategy that benefits from past optimisation experience. Empirical studies have been conducted on a wide range of DCARP instances to evaluate our proposed framework. The results show that the proposed framework significantly improves over state-of-the-art dynamic optimisation algorithms.

AIApr 28, 2020

Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi

Rodrigo Canaan, Xianbo Gao, Julian Togelius et al.

Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. Evaluating an agent in this setting requires a diverse population of potential partners, but so far, the behavioral diversity of agents has not been considered in a systematic way. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose, and generates a population of diverse Hanabi agents using MAP-Elites. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche. We show this meta-strategy can work better than generalist strategies even outside the population it was trained with if its partner's behavioral niche can be correctly inferred, but in practice a partner's behavior depends and interferes with the meta-agent's own behavior, suggesting an avenue for future research in characterizing another agent's behavior during gameplay.

AIApr 28, 2020

Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners

Rodrigo Canaan, Xianbo Gao, Youjin Chung et al.

Hanabi is a cooperative game that challenges exist-ing AI techniques due to its focus on modeling the mental states ofother players to interpret and predict their behavior. While thereare agents that can achieve near-perfect scores in the game byagreeing on some shared strategy, comparatively little progresshas been made in ad-hoc cooperation settings, where partnersand strategies are not known in advance. In this paper, we showthat agents trained through self-play using the popular RainbowDQN architecture fail to cooperate well with simple rule-basedagents that were not seen during training and, conversely, whenthese agents are trained to play with any individual rule-basedagent, or even a mix of these agents, they fail to achieve goodself-play scores.

NEJul 31, 2019

Competitive Coevolution as an Adversarial Approach to Dynamic Optimization

Xiaofen Lu, Ke Tang, Stefan Menzel et al.

Dynamic optimization, for which the objective functions change over time, has attracted intensive investigations due to the inherent uncertainty associated with many real-world problems. For its robustness with respect to noise, Evolutionary Algorithms (EAs) have been expected to have great potential for dynamic optimization. On the other hand, EAs are also criticized for its high computational complexity, which appears to be contradictory to the core requirement of real-world dynamic optimization, i.e., fast adaptation (typically in terms of wall-clock time) to the environmental change. So far, whether EAs would indeed lead to a truly effective approach for real-world dynamic optimization remain unclear. In this paper, a new framework of employing EAs in the context of dynamic optimization is explored. We suggest that, instead of online evolving (searching) solutions for the ever-changing objective function, EAs are more suitable for acquiring an archive of solutions in an offline way, which could be adopted to construct a system to provide high-quality solutions efficiently in a dynamic environment. To be specific, we first re-formulate dynamic optimization problems as static set-oriented optimization problems. Then, a particular type of EAs, namely competitive coevolution, is employed to search for the archive of solutions in an adversarial way. The general framework is instantiated for continuous dynamic constrained optimization problems, and the empirical results showed the potential of the proposed framework.

AIJul 8, 2019

Diverse Agents for Ad-Hoc Cooperation in Hanabi

Rodrigo Canaan, Julian Togelius, Andy Nealen et al.

In complex scenarios where a model of other actors is necessary to predict and interpret their actions, it is often desirable that the model works well with a wide variety of previously unknown actors. Hanabi is a card game that brings the problem of modeling other players to the forefront, but there is no agreement on how to best generate a pool of agents to use as partners in ad-hoc cooperation evaluation. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate populations for this purpose and shows an initial implementation of an agent generator based on this idea. We also discuss what metrics can be used to compare such generators, and how the proposed generator could be leveraged to help build adaptive agents for the game.

AISep 26, 2018

Evolving Agents for the Hanabi 2018 CIG Competition

Rodrigo Canaan, Haotian Shen, Ruben Rodriguez Torrado et al.

Hanabi is a cooperative card game with hidden information that has won important awards in the industry and received some recent academic attention. A two-track competition of agents for the game will take place in the 2018 CIG conference. In this paper, we develop a genetic algorithm that builds rule-based agents by determining the best sequence of rules from a fixed rule set to use as strategy. In three separate experiments, we remove human assumptions regarding the ordering of rules, add new, more expressive rules to the rule set and independently evolve agents specialized at specific game sizes. As result, we achieve scores superior to previously published research for the mirror and mixed evaluation of agents.

AISep 26, 2018

Towards Game-based Metrics for Computational Co-creativity

Rodrigo Canaan, Stefan Menzel, Julian Togelius et al.

We propose the following question: what game-like interactive system would provide a good environment for measuring the impact and success of a co-creative, cooperative agent? Creativity is often formulated in terms of novelty, value, surprise and interestingness. We review how these concepts are measured in current computational intelligence research and provide a mapping from modern electronic and tabletop games to open research problems in mixed-initiative systems and computational co-creativity. We propose application scenarios for future research, and a number of metrics under which the performance of cooperative agents in these environments will be evaluated.