CLOct 20, 2021

A Self-Explainable Stylish Image Captioning Framework via Multi-References

arXiv:2110.10704v20.51 citations

Originality Synthesis-oriented

AI Analysis

This addresses the problem of generating and explaining stylish captions for images, but appears incremental as it builds on existing captioning methods.

The paper tackles stylish image captioning by proposing a Multi-style Multi-modality mechanism (2M) to build effective captioners, and demonstrates that multi-references from the model can explain errors by identifying faulty input features.

In this paper, we propose to build a stylish image captioning model through a Multi-style Multi modality mechanism (2M). We demonstrate that with 2M, we can build an effective stylish captioner and that multi-references produced by the model can also support explaining the model through identifying erroneous input features on faulty examples. We show how this 2M mechanism can be used to build stylish captioning models and show how these models can be utilized to provide explanations of likely errors in the models.

View on arXiv PDF

Similar