AIDec 22, 2024Code
Survey on Abstractive Text Summarization: Dataset, Models, and MetricsGospel Ozioma Nnadi, Flavio Bertini
The advancements in deep learning, particularly the introduction of transformers, have been pivotal in enhancing various natural language processing (NLP) tasks. These include text-to-text applications such as machine translation, text classification, and text summarization, as well as data-to-text tasks like response generation and image-to-text tasks such as captioning. Transformer models are distinguished by their attention mechanisms, pretraining on general knowledge, and fine-tuning for downstream tasks. This has led to significant improvements, particularly in abstractive summarization, where sections of a source document are paraphrased to produce summaries that closely resemble human expression. The effectiveness of these models is assessed using diverse metrics, encompassing techniques like semantic overlap and factual correctness. This survey examines the state of the art in text summarization models, with a specific focus on the abstractive summarization approach. It reviews various datasets and evaluation metrics used to measure model performance. Additionally, it includes the results of test cases using abstractive summarization models to underscore the advantages and limitations of contemporary transformer-based models. The source codes and the data are available at https://github.com/gospelnnadi/Text-Summarization-SOTA-Experiment.
LOFeb 13, 2025
Data2Concept2Text: An Explainable Multilingual Framework for Data Analysis NarrationFlavio Bertini, Alessandro Dal Palù, Federica Zaglio et al.
This paper presents a complete explainable system that interprets a set of data, abstracts the underlying features and describes them in a natural language of choice. The system relies on two crucial stages: (i) identifying emerging properties from data and transforming them into abstract concepts, and (ii) converting these concepts into natural language. Despite the impressive natural language generation capabilities demonstrated by Large Language Models, their statistical nature and the intricacy of their internal mechanism still force us to employ these techniques as black boxes, forgoing trustworthiness. Developing an explainable pipeline for data interpretation would allow facilitating its use in safety-critical environments like processing medical information and allowing non-experts and visually impaired people to access narrated information. To this end, we believe that the fields of knowledge representation and automated reasoning research could present a valid alternative. Expanding on prior research that tackled the first stage (i), we focus on the second stage, named Concept2Text. Being explainable, data translation is easily modeled through logic-based rules, once again emphasizing the role of declarative programming in achieving AI explainability. This paper explores a Prolog/CLP-based rewriting system to interpret concepts-articulated in terms of classes and relations, plus common knowledge-derived from a generic ontology, generating natural language text. Its main features include hierarchical tree rewritings, modular multilingual generation, support for equivalent variants across semantic, grammar, and lexical levels, and a transparent rule-based system. We outline the architecture and demonstrate its flexibility through some examples capable of generating numerous diverse and equivalent rewritings based on the input concept.
MMJun 6, 2020
Are Social Networks Watermarking Us or Are We (Unawarely) Watermarking Ourself?Flavio Bertini, Rajesh Sharma, Danilo Montesi
In the last decade, Social Networks (SNs) have deeply changed many aspects of society, and one of the most widespread behaviours is the sharing of pictures. However, malicious users often exploit shared pictures to create fake profiles leading to the growth of cybercrime. Thus, keeping in mind this scenario, authorship attribution and verification through image watermarking techniques are becoming more and more important. In this paper, firstly, we investigate how 13 most popular SNs treat the uploaded pictures, in order to identify a possible implementation of image watermarking techniques by respective SNs. Secondly, on these 13 SNs, we test the robustness of several image watermarking algorithms. Finally, we verify whether a method based on the Photo-Response Non-Uniformity (PRNU) technique can be successfully used as a watermarking approach for authorship attribution and verification of pictures on SNs. The proposed method is robust enough in spite of the fact that the pictures get downgraded during the uploading process by SNs. The results of our analysis on a real dataset of 8,400 pictures show that the proposed method is more effective than other watermarking techniques and can help to address serious questions about privacy and security on SNs.