IRMay 19, 2020

Multi-Modal Summary Generation using Multi-Objective Optimization

arXiv:2005.09252v114 citations
Originality Incremental advance
AI Analysis

This addresses the need for more comprehensive multi-modal summarization in communication technology, though it is incremental by extending existing text and image methods to include videos.

The paper tackles the problem of generating multi-modal summaries that include text, images, and videos by proposing an extractive multi-objective optimization model, which outperforms state-of-the-art approaches in evaluations.

Significant development of communication technology over the past few years has motivated research in multi-modal summarization techniques. A majority of the previous works on multi-modal summarization focus on text and images. In this paper, we propose a novel extractive multi-objective optimization based model to produce a multi-modal summary containing text, images, and videos. Important objectives such as intra-modality salience, cross-modal redundancy and cross-modal similarity are optimized simultaneously in a multi-objective optimization framework to produce effective multi-modal output. The proposed model has been evaluated separately for different modalities, and has been found to perform better than state-of-the-art approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes