Vinicius Lima

LG
h-index22
4papers
10citations
Novelty21%
AI Score27

4 Papers

AIAug 5, 2025Code
Toward a Trustworthy Optimization Modeling Agent via Verifiable Synthetic Data Generation

Vinicius Lima, Dzung T. Phan, Jayant Kalagnanam et al.

We present a framework for training trustworthy large language model (LLM) agents for optimization modeling via a verifiable synthetic data generation pipeline. Focusing on linear and mixed-integer linear programming, our approach begins with structured symbolic representations and systematically produces natural language descriptions, mathematical formulations, and solver-executable code. By programmatically constructing each instance with known optimal solutions, the pipeline ensures full verifiability and enables automatic filtering of low-quality demonstrations generated by teacher models. Each dataset instance includes a structured representation of the optimization problem, a corresponding natural language description, the verified optimal solution, and step-by-step demonstrations - generated by a teacher model - that show how to model and solve the problem across multiple optimization modeling languages. This enables supervised fine-tuning of open-source LLMs specifically tailored to optimization tasks. To operationalize this pipeline, we introduce OptiTrust, a modular LLM agent that performs multi-stage translation from natural language to solver-ready code, leveraging stepwise demonstrations, multi-language inference, and majority-vote cross-validation. Our agent achieves state-of-the-art performance on standard benchmarks. Out of 7 datasets, it achieves the highest accuracy on six and outperforms the next-best algorithm by at least 8 percentage on three of them. Our approach provides a scalable, verifiable, and principled path toward building reliable LLM agents for real-world optimization applications.

LGOct 30, 2024
Advancing Crime Linkage Analysis with Machine Learning: A Comprehensive Review and Framework for Data-Driven Approaches

Vinicius Lima, Umit Karabiyik

Crime linkage is the process of analyzing criminal behavior data to determine whether a pair or group of crime cases are connected or belong to a series of offenses. This domain has been extensively studied by researchers in sociology, psychology, and statistics. More recently, it has drawn interest from computer scientists, especially with advances in artificial intelligence. Despite this, the literature indicates that work in this latter discipline is still in its early stages. This study aims to understand the challenges faced by machine learning approaches in crime linkage and to support foundational knowledge for future data-driven methods. To achieve this goal, we conducted a comprehensive survey of the main literature on the topic and developed a general framework for crime linkage processes, thoroughly describing each step. Our goal was to unify insights from diverse fields into a shared terminology to enhance the research landscape for those intrigued by this subject.

CLJan 4, 2024
Identifying Risk Patterns in Brazilian Police Reports Preceding Femicides: A Long Short Term Memory (LSTM) Based Analysis

Vinicius Lima, Jaque Almeida de Oliveira

Femicide refers to the killing of a female victim, often perpetrated by an intimate partner or family member, and is also associated with gender-based violence. Studies have shown that there is a pattern of escalating violence leading up to these killings, highlighting the potential for prevention if the level of danger to the victim can be assessed. Machine learning offers a promising approach to address this challenge by predicting risk levels based on textual descriptions of the violence. In this study, we employed the Long Short Term Memory (LSTM) technique to identify patterns of behavior in Brazilian police reports preceding femicides. Our first objective was to classify the content of these reports as indicating either a lower or higher risk of the victim being murdered, achieving an accuracy of 66%. In the second approach, we developed a model to predict the next action a victim might experience within a sequence of patterned events. Both approaches contribute to the understanding and assessment of the risks associated with domestic violence, providing authorities with valuable insights to protect women and prevent situations from escalating.

LGDec 28, 2023
Hotspot Prediction of Severe Traffic Accidents in the Federal District of Brazil

Vinicius Lima, Vetria Byrd

Traffic accidents are one of the biggest challenges in a society where commuting is so important. What triggers an accident can be dependent on several subjective parameters and varies within each region, city, or country. In the same way, it is important to understand those parameters in order to provide a knowledge basis to support decisions regarding future cases prevention. The literature presents several works where machine learning algorithms are used for prediction of accidents or severity of accidents, in which city-level datasets were used as evaluation studies. This work attempts to add to the diversity of research, by focusing mainly on concentration of accidents and how machine learning can be used to predict hotspots. This approach demonstrated to be a useful technique for authorities to understand nuances of accident concentration behavior. For the first time, data from the Federal District of Brazil collected from forensic traffic accident analysts were used and combined with data from local weather conditions to predict hotspots of collisions. Out of the five algorithms we considered, two had good performance: Multi-layer Perceptron and Random Forest, with the latter being the best one at 98% accuracy. As a result, we identify that weather parameters are not as important as the accident location, demonstrating that local intervention is important to reduce the number of accidents.