Ruoyun Ma

AI
h-index17
3papers
33citations
Novelty33%
AI Score38

3 Papers

LGNov 30, 2023
Large Language Models for Travel Behavior Prediction

Baichuan Mo, Hanyong Xu, Dingyi Zhuang et al.

Travel behavior prediction is a fundamental task in transportation demand management. The conventional methods for travel behavior prediction rely on numerical data to construct mathematical models and calibrate model parameters to represent human preferences. Recent advancement in large language models (LLMs) has shown great reasoning abilities to solve complex problems. In this study, we propose to use LLMs to predict travel behavior with prompt engineering without data-based parameter learning. Specifically, we carefully design our prompts that include 1) task description, 2) travel characteristics, 3) individual attributes, and 4) guides of thinking with domain knowledge, and ask the LLMs to predict an individual's travel behavior and explain the results. We select the travel mode choice task as a case study. Results show that, though no training samples are provided, LLM-based predictions have competitive accuracy and F1-score as canonical supervised learning methods such as multinomial logit, random forest, and neural networks. LLMs can also output reasons that support their prediction. However, though in most of the cases, the output explanations are reasonable, we still observe cases that violate logic or with hallucinations.

AIAug 5, 2025Code
ContractEval: Benchmarking LLMs for Clause-Level Legal Risk Identification in Commercial Contracts

Shuang Liu, Zelong Li, Ruoyun Ma et al.

The potential of large language models (LLMs) in specialized domains such as legal risk analysis remains underexplored. In response to growing interest in locally deploying open-source LLMs for legal tasks while preserving data confidentiality, this paper introduces ContractEval, the first benchmark to thoroughly evaluate whether open-source LLMs could match proprietary LLMs in identifying clause-level legal risks in commercial contracts. Using the Contract Understanding Atticus Dataset (CUAD), we assess 4 proprietary and 15 open-source LLMs. Our results highlight five key findings: (1) Proprietary models outperform open-source models in both correctness and output effectiveness, though some open-source models are competitive in certain specific dimensions. (2) Larger open-source models generally perform better, though the improvement slows down as models get bigger. (3) Reasoning ("thinking") mode improves output effectiveness but reduces correctness, likely due to over-complicating simpler tasks. (4) Open-source models generate "no related clause" responses more frequently even when relevant clauses are present. This suggests "laziness" in thinking or low confidence in extracting relevant content. (5) Model quantization speeds up inference but at the cost of performance drop, showing the tradeoff between efficiency and accuracy. These findings suggest that while most LLMs perform at a level comparable to junior legal assistants, open-source models require targeted fine-tuning to ensure correctness and effectiveness in high-stakes legal settings. ContractEval offers a solid benchmark to guide future development of legal-domain LLMs.

CYJan 8
LLM Agents in Law: Taxonomy, Applications, and Challenges

Shuang Liu, Ruijia Zhang, Ruoyun Ma et al.

Large language models (LLMs) have precipitated a dramatic improvement in the legal domain, yet the deployment of standalone models faces significant limitations regarding hallucination, outdated information, and verifiability. Recently, LLM agents have attracted significant attention as a solution to these challenges, utilizing advanced capabilities such as planning, memory, and tool usage to meet the rigorous standards of legal practice. In this paper, we present a comprehensive survey of LLM agents for legal tasks, analyzing how these architectures bridge the gap between technical capabilities and domain-specific needs. Our major contributions include: (1) systematically analyzing the technical transition from standard legal LLMs to legal agents; (2) presenting a structured taxonomy of current agent applications across distinct legal practice areas; (3) discussing evaluation methodologies specifically for agentic performance in law; and (4) identifying open challenges and outlining future directions for developing robust and autonomous legal assistants.