LGDec 9, 2024

Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach

Tsinghua
arXiv:2412.06684v27 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the challenge of low testing efficiency and limited diversity for decision-making policies, which is crucial for ensuring reliability in applications such as autonomous driving and robotics, though it appears incremental as it builds on existing LLM capabilities.

The paper tackles the problem of testing decision-making policies in fields like autonomous driving and robotics by proposing an LLM-driven online testing framework to explore critical and diverse scenarios, and it demonstrates that this method significantly outperforms baseline methods on five benchmarks.

Recent advances in decision-making policies have led to significant progress in fields such as autonomous driving and robotics. However, testing these policies remains crucial with the existence of critical scenarios that may threaten their reliability. Despite ongoing research, challenges such as low testing efficiency and limited diversity persist due to the complexity of the decision-making policies and their environments. To address these challenges, this paper proposes an adaptable Large Language Model (LLM)-driven online testing framework to explore critical and diverse testing scenarios for decision-making policies. Specifically, we design a "generate-test-feedback" pipeline with templated prompt engineering to harness the world knowledge and reasoning abilities of LLMs. Additionally, a multi-scale scenario generation strategy is proposed to address the limitations of LLMs in making fine-grained adjustments, further enhancing testing efficiency. Finally, the proposed LLM-driven method is evaluated on five widely recognized benchmarks, and the experimental results demonstrate that our method significantly outperforms baseline methods in uncovering both critical and diverse scenarios. These findings suggest that LLM-driven methods hold significant promise for advancing the testing of decision-making policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes