CLIRFeb 5, 2016

Utilização de Grafos e Matriz de Similaridade na Sumarização Automática de Documentos Baseada em Extração de Frases

arXiv:1602.02047v1
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of information overload for users by providing incremental improvements in automatic text summarization.

The thesis tackled automatic text summarization by proposing graph and similarity matrix heuristics to measure sentence relevance and generate summaries, achieving interesting results across multiple languages including English, French, Spanish, and Brazilian Portuguese.

The internet increased the amount of information available. However, the reading and understanding of this information are costly tasks. In this scenario, the Natural Language Processing (NLP) applications enable very important solutions, highlighting the Automatic Text Summarization (ATS), which produce a summary from one or more source texts. Automatically summarizing one or more texts, however, is a complex task because of the difficulties inherent to the analysis and generation of this summary. This master's thesis describes the main techniques and methodologies (NLP and heuristics) to generate summaries. We have also addressed and proposed some heuristics based on graphs and similarity matrix to measure the relevance of judgments and to generate summaries by extracting sentences. We used the multiple languages (English, French and Spanish), CSTNews (Brazilian Portuguese), RPM (French) and DECODA (French) corpus to evaluate the developped systems. The results obtained were quite interesting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes