Liuzhenghao Lv

CL
h-index8
5papers
26citations
Novelty42%
AI Score47

5 Papers

AIMar 1Code
BioProAgent: Neuro-Symbolic Grounding for Constrained Scientific Planning

Yuyang Liu, Jingya Wang, Liuzhenghao Lv et al.

Large language models (LLMs) have demonstrated significant reasoning capabilities in scientific discovery but struggle to bridge the gap to physical execution in wet-labs. In these irreversible environments, probabilistic hallucinations are not merely incorrect, but also cause equipment damage or experimental failure. To address this, we propose \textbf{BioProAgent}, a neuro-symbolic framework that anchors probabilistic planning in a deterministic Finite State Machine (FSM). We introduce a State-Augmented Planning mechanism that enforces a rigorous \textit{Design-Verify-Rectify} workflow, ensuring hardware compliance before execution. Furthermore, we address the context bottleneck inherent in complex device schemas by \textit{Semantic Symbol Grounding}, reducing token consumption by $\sim$6$\times$ through symbolic abstraction. In the extended BioProBench benchmark, BioProAgent achieves 95.6\% physical compliance (compared to 21.0\% for ReAct), demonstrating that neuro-symbolic constraints are essential for reliable autonomy in irreversible physical environments. \footnote{Code at https://github.com/YuyangSunshine/bioproagent and project at https://yuyangsunshine.github.io/BioPro-Project/}

CLDec 20, 2023Code
Machine Mindset: An MBTI Exploration of Large Language Models

Jiaxi Cui, Liuzhenghao Lv, Jing Wen et al.

We present a novel approach for integrating Myers-Briggs Type Indicator (MBTI) personality traits into large language models (LLMs), addressing the challenges of personality consistency in personalized AI. Our method, "Machine Mindset," involves a two-phase fine-tuning and Direct Preference Optimization (DPO) to embed MBTI traits into LLMs. This approach ensures that models internalize these traits, offering a stable and consistent personality profile. We demonstrate the effectiveness of our models across various domains, showing alignment between model performance and their respective MBTI traits. The paper highlights significant contributions in the development of personality datasets and a new training methodology for personality integration in LLMs, enhancing the potential for personalized AI applications. We also open-sourced our model and part of the data at \url{https://github.com/PKU-YuanGroup/Machine-Mindset}.

77.2BMMay 15
MoleCode unlocks structural intelligence in large language models

Zhiyuan Yan, Chen Liu, Boxuan Zhao et al.

Molecules are graphs, but large language models~(LLMs) are usually asked to reason about them through linear strings. The most popular molecular representation, SMILES, compresses atoms, bonds, branches and rings into a compact sequence in which topology is implicit, forcing LLMs to reconstruct molecular structure before performing the requested chemical operation. Here we introduce MoleCode, an LLM-native, training-free, graph-explicit molecular language in which all molecular components are represented as typed entities with persistent identifiers and explicit relations. MoleCode makes molecular topology directly readable, editable and auditable within the language context, allowing an LLM to operate on structure rather than recover it from syntax. Across molecular reasoning, editing, generation and analysis tasks, this representational shift improves frontier LLMs most strongly when structural access is limiting: unfamiliar molecules, topology-sensitive operations, larger structures and repetitive polymers. It also changes how inference is allocated, replacing long reasoning traces devoted to implicit structural reconstruction with shorter, more chemically directed reasoning over explicit atoms and bonds. In molecular optimization, this enables localized, property-aligned edits that preserve structural similarity to the starting compounds. The same Subgraph--Node--Edge grammar extends beyond small molecules to polymers, Markush structures, mechanism-style transformations and interleaved scientific documents, including research articles and patent disclosures in which chemical information is distributed across text and images. These results suggest that the interface between scientific objects and LLMs should not treat structure as something to be decoded from text. When the object of reasoning is relational, the structure itself should be part of the language.

CLMay 11, 2025Code
BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning

Yuyang Liu, Liuzhenghao Lv, Xiancheng Zhang et al.

Biological protocols are fundamental to reproducibility and safety in life science research. While large language models (LLMs) perform well on general tasks, their systematic evaluation on these highly specialized, accuracy-critical, and inherently procedural texts remains limited. In this work, we present BioProBench, the first large-scale, multi-task benchmark for biological protocol understanding and reasoning. While there are several benchmark tasks involving protocol question answering, BioProBench provides a comprehensive suite of five core tasks: Protocol Question Answering, Step Ordering, Error Correction, Protocol Generation, and Protocol Reasoning, enabling a holistic evaluation of LLMs on procedural biological texts. Built upon 27K original protocols, it yields nearly 556K high-quality structured instances. We evaluate 12 mainstream open/closed-source LLMs. Experimental results reveal that some models perform well on basic understanding tasks (e.g., \sim70% PQA-Acc., >64% ERR F1), but struggle significantly with deep reasoning and structured generation tasks like ordering and generation. Furthermore, model comparisons show diverse performance: certain open-source models approach closed-source levels on some tasks, yet bio-specific small models lag behind general LLMs, indicating limitations on complex procedural content. Overall, BioProBench, through its task design and experimental findings, systematically reveals the fundamental challenges for current LLMs in procedural knowledge understanding, deep adaptability to specific domains, reliability of structured reasoning, and handling of sophisticated precision and safety constraints, providing key directions for future AI in the field of scientific experiment automation. The code and data are available at: https://github.com/YuyangSunshine/bioprotocolbench and https://huggingface.co/datasets/BioProBench/BioProBench.

CLApr 10, 2025
How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension

Hao Li, Liuzhenghao Lv, He Cao et al.

Large language models are increasingly used in scientific domains, especially for molecular understanding and analysis. However, existing models are affected by hallucination issues, resulting in errors in drug design and utilization. In this paper, we first analyze the sources of hallucination in LLMs for molecular comprehension tasks, specifically the knowledge shortcut phenomenon observed in the PubChem dataset. To evaluate hallucination in molecular comprehension tasks with computational efficiency, we introduce \textbf{Mol-Hallu}, a novel free-form evaluation metric that quantifies the degree of hallucination based on the scientific entailment relationship between generated text and actual molecular properties. Utilizing the Mol-Hallu metric, we reassess and analyze the extent of hallucination in various LLMs performing molecular comprehension tasks. Furthermore, the Hallucination Reduction Post-processing stage~(HRPP) is proposed to alleviate molecular hallucinations, Experiments show the effectiveness of HRPP on decoder-only and encoder-decoder molecular LLMs. Our findings provide critical insights into mitigating hallucination and improving the reliability of LLMs in scientific applications.