LOMay 24
DateSAT: A Framework for Solving Date and Period ConstraintsLeyi Cui, Shrey Tiwari, Rohan Padhye
Dates and calendar periods (i.e., days, months, years) appear frequently in tasks involving analysis of software, data, and documents. Prior research has shown that computer logic involving dates and calendrical calculations is error-prone due to tricky rules (e.g., irregularly sized months), ambiguities (e.g., scheduling one month from "Jan 31st"), and edge cases (e.g., leap years). However, existing program analysis and verification tools do not provide native support for dates, making it hard to reason about operations involving calendrical arithmetic symbolically. This paper presents DateSAT, the first framework for expressing and solving satisfiability constraints involving dates and calendar periods. The paper first formalizes an input language and the semantics of date and period arithmetic. The paper then presents five separate strategies for solving DateSAT constraints based on reductions to SMT formulas involving integers, which we have implemented using Z3 as a backend. We curate a dataset of 450 DateSAT constraints synthesized using LLM prompting, grammar-based sampling, and mining legal documents, and then present an empirical evaluation of DateSAT solver performance.
SESep 18, 2024
Combining LLM Code Generation with Formal Specifications and Reactive Program SynthesisWilliam Murphy, Nikolaus Holzer, Feitong Qiao et al.
In the past few years, Large Language Models (LLMs) have exploded in usefulness and popularity for code generation tasks. However, LLMs still struggle with accuracy and are unsuitable for high-risk applications without additional oversight and verification. In particular, they perform poorly at generating code for highly complex systems, especially with unusual or out-of-sample logic. For such systems, verifying the code generated by the LLM may take longer than writing it by hand. We introduce a solution that divides the code generation into two parts; one to be handled by an LLM and one to be handled by formal methods-based program synthesis. We develop a benchmark to test our solution and show that our method allows the pipeline to solve problems previously intractable for LLM code generation.
SEJan 14, 2025Code
CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code GenerationJinjun Peng, Leyi Cui, Kele Huang et al.
Large Language Models (LLMs) have significantly aided developers by generating or assisting in code writing, enhancing productivity across various tasks. While identifying incorrect code is often straightforward, detecting vulnerabilities in functionally correct code is more challenging, especially for developers with limited security knowledge, which poses considerable security risks of using LLM-generated code and underscores the need for robust evaluation benchmarks that assess both functional correctness and security. Current benchmarks like CyberSecEval and SecurityEval attempt to solve it but are hindered by unclear and impractical specifications, failing to assess both functionality and security accurately. To tackle these deficiencies, we introduce CWEval, a novel outcome-driven evaluation framework designed to enhance the evaluation of secure code generation by LLMs. This framework not only assesses code functionality but also its security simultaneously with high-quality task specifications and outcome-driven test oracles which provides high accuracy. Coupled with CWEval-bench, a multilingual, security-critical coding benchmark, CWEval provides a rigorous empirical security evaluation on LLM-generated code, overcoming previous benchmarks' shortcomings. Through our evaluations, CWEval reveals a notable portion of functional but insecure code produced by LLMs, and shows a serious inaccuracy of previous evaluations, ultimately contributing significantly to the field of secure code generation. We open-source our artifact at: https://github.com/Co1lin/CWEval .
CPDec 7, 2022
Bi-LSTM Price Prediction based on Attention MechanismJiashu Lou, Leyi Cui, Ye Li
With the increasing enrichment and development of the financial derivatives market, the frequency of transactions is also faster and faster. Due to human limitations, algorithms and automatic trading have recently become the focus of discussion. In this paper, we propose a bidirectional LSTM neural network based on an attention mechanism, which is based on two popular assets, gold and bitcoin. In terms of Feature Engineering, on the one hand, we add traditional technical factors, and at the same time, we combine time series models to develop factors. In the selection of model parameters, we finally chose a two-layer deep learning network. According to AUC measurement, the accuracy of bitcoin and gold is 71.94% and 73.03% respectively. Using the forecast results, we achieved a return of 1089.34% in two years. At the same time, we also compare the attention Bi-LSTM model proposed in this paper with the traditional model, and the results show that our model has the best performance in this data set. Finally, we discuss the significance of the model and the experimental results, as well as the possible improvement direction in the future.
MNNov 11, 2022
Reconstruction of gene regulatory network via sparse optimizationJiashu Lou, Leyi Cui, Wenxuan Qiu
In this paper, we tested several sparse optimization algorithms based on the public dataset of the DREAM5 Gene Regulatory Network Inference Challenge. And we find that introducing 20% of the regulatory network as a priori known data can provide a basis for parameter selection of inference algorithms, thus improving prediction efficiency and accuracy. In addition to testing common sparse optimization methods, we also developed voting algorithms by bagging them. Experiments on the DREAM5 dataset show that the sparse optimization-based inference of the moderation relation works well, achieving better results than the official DREAM5 results on three datasets. However, the performance of traditional independent algorithms varies greatly in the face of different datasets, while our voting algorithm achieves the best results on three of the four datasets.
LGJun 11, 2024
Guiding LLM Temporal Logic Generation with Explicit Separation of Data and ControlWilliam Murphy, Nikolaus Holzer, Nathan Koenig et al.
Temporal logics are powerful tools that are widely used for the synthesis and verification of reactive systems. The recent progress on Large Language Models (LLMs) has the potential to make the process of writing such specifications more accessible. However, writing specifications in temporal logics remains challenging for all but the most expert users. A key question in using LLMs for temporal logic specification engineering is to understand what kind of guidance is most helpful to the LLM and the users to easily produce specifications. Looking specifically at the problem of reactive program synthesis, we explore the impact of providing an LLM with guidance on the separation of control and data--making explicit for the LLM what functionality is relevant for the specification, and treating the remaining functionality as an implementation detail for a series of pre-defined functions and predicates. We present a benchmark set and find that this separation of concerns improves specification generation. Our benchmark provides a test set against which to verify future work in LLM generation of temporal logic specifications.