Renwei Meng

LG
4papers
1citation
Novelty65%
AI Score55

4 Papers

35.9LGJun 3
Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

Renwei Meng

Reinforcement learning with verifiable rewards improves reasoning and tool use, yet long-horizon language agents still learn unsupported evidence chains, belief drift, and shortcut actions that satisfy terminal checks. Existing process rewards are mostly correlational: they reward retrieval-, reflection-, or verification-like steps without estimating whether the step contributes to final verified success under a specified intervention. We propose CVT-RL, a constrained policy-gradient algorithm with dense verifiable rewards, intervention-validity gating, and a policy-conditioned counterfactual contribution (PCCC) estimator. Deletion, semantic substitution, evidence substitution, and tool-output perturbation define separate controlled interventions; continuations are sampled from a frozen reference policy, and a selection-adjusted doubly robust estimator augments the advantage. Belief control uses only prefix-observable labels, while an augmented Lagrangian constrains unsupported claims, skipped verification, tool tampering, and unsafe calls. On long-context QA, ALFWorld, ScienceWorld, and web/tool tasks, CVT-RL improves average task success from 71.8% for compute-matched non-causal RL and 75.4% for an information-matched counterfactual-process baseline to 78.9%, improves evidence F1 from 78.9 to 82.8 over the information-matched baseline, and reduces measured hacking from 7.2% to 3.9%. Independent human audit estimates 4.6% hacking for CVT-RL versus 8.1% for the information-matched baseline, and adaptive detector-evasion attacks raise hacking only to 7.1%. Stratified bootstrap and mixed-effects tests give p<0.01 after Holm correction for all primary metrics. Carefully scoped counterfactual credit, paired with validity gating, diagnostics, and verifiable constraints, provides a reproducible route toward more reliable long-horizon RL for language agents.

69.3SEMar 11Code
VulnAgent-X: A Layered Agentic Framework for Repository-Level Vulnerability Detection

Renwei Meng, Haoyi Wu, Jingming Wang et al.

Software vulnerability detection is critical in software en- gineering as security flaws arise from complex interactions across code structure, repository context, and runtime conditions. Existing meth- ods are limited by local code views, one-shot prediction, and insuffi- cient validation, reducing reliability in realistic repository-level settings. This study proposes VulnAgentX, a layered agentic framework integrat- ing lightweight risk screening, bounded context expansion, specialised analysis agents, selective dynamic verification, and evidence fusion into a unified pipeline. Experiments on function-level and just-in-time vul- nerability benchmarks show VulnAgent-X outperforms static baselines, encoder-based models, and simpler agentic variants, with better local- isation and balanced performance-cost trade-offs. Treating vulnerabil- ity detection as a staged, evidence-driven auditing process improves de- tection quality, reduces false positives, and produces interpretable re- sults for repository-level software security analysis. Code is available at https://github.com/xiaolu-666113/Vlun-Agent-X.

46.0AIMar 10Code
Explainable Innovation Engine: Dual-Tree Agent-RAG with Methods-as-Nodes and Verifiable Write-Back

Renwei Meng

Retrieval-augmented generation (RAG) improves factual grounding, yet most systems rely on flat chunk retrieval and provide limited control over multi-step synthesis. We propose an Explainable Innovation Engine that upgrades the knowledge unit from text chunks to methods-as-nodes. The engine maintains a weighted method provenance tree for traceable derivations and a hierarchical clustering abstraction tree for efficient top-down navigation. At inference time, a strategy agent selects explicit synthesis operators (e.g., induction, deduction, analogy), composes new method nodes, and records an auditable trajectory. A verifier-scorer layer then prunes low-quality candidates and writes validated nodes back to support continual growth. Expert evaluation across six domains and multiple backbones shows consistent gains over a vanilla baseline, with the largest improvements on derivation-heavy settings, and ablations confirm the complementary roles of provenance backtracking and pruning. These results suggest a practical path toward controllable, explainable, and verifiable innovation in agentic RAG systems. Code is available at the project GitHub repository https://github.com/xiaolu-666113/Dual-Tree-Agent-RAG.

10.8LGMar 11
Group Resonance Network: Learnable Prototypes and Multi-Subject Resonance for EEG Emotion Recognition

Renwei Meng

Electroencephalography(EEG)-basedemotionrecognitionre- mains challenging in cross-subject settings due to severe inter-subject variability. Existing methods mainly learn subject-invariant features, but often under-exploit stimulus-locked group regularities shared across sub- jects. To address this issue, we propose the Group Resonance Network (GRN), which integrates individual EEG dynamics with offline group resonance modeling. GRN contains three components: an individual en- coder for band-wise EEG features, a set of learnable group prototypes for prototype-induced resonance, and a multi-subject resonance branch that encodes PLV/coherence-based synchrony with a small reference set. A resonance-aware fusion module combines individual and group-level rep- resentations for final classification. Experiments on SEED and DEAP under both subject-dependent and leave-one-subject-out protocols show that GRN consistently outperforms competitive baselines, while abla- tion studies confirm the complementary benefits of prototype learning and multi-subject resonance modeling.