IRCLJul 7, 2025

Analise Semantica Automatizada com LLM e RAG para Bulas Farmaceuticas

arXiv:2507.21103v1
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of efficiently extracting and analyzing unstructured technical texts in health environments, but it is incremental as it applies existing methods to a new domain.

This work tackled the problem of analyzing unstructured information in PDF documents, specifically drug package inserts, by using RAG architectures combined with LLMs, resulting in significant gains in information retrieval and interpretation as validated by metrics like accuracy and completeness.

The production of digital documents has been growing rapidly in academic, business, and health environments, presenting new challenges in the efficient extraction and analysis of unstructured information. This work investigates the use of RAG (Retrieval-Augmented Generation) architectures combined with Large-Scale Language Models (LLMs) to automate the analysis of documents in PDF format. The proposal integrates vector search techniques by embeddings, semantic data extraction and generation of contextualized natural language responses. To validate the approach, we conducted experiments with drug package inserts extracted from official public sources. The semantic queries applied were evaluated by metrics such as accuracy, completeness, response speed and consistency. The results indicate that the combination of RAG with LLMs offers significant gains in intelligent information retrieval and interpretation of unstructured technical texts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes