CVAINov 29, 2022

PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals

arXiv:2211.15940v32 citationsh-index: 21Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the accessibility gap for non-deep learning professionals and domain experts who lack programming skills, though it is incremental as it builds on existing models and APIs.

The authors tackled the problem of making state-of-the-art visual-language pretrained models accessible to non-experts by developing PiggyBack, a browser-based Visual Question Answering platform that supports data processing, model fine-tuning, and visualization, resulting in a free, portable tool that runs on almost any platform.

We propose a PiggyBack, a Visual Question Answering platform that allows users to apply the state-of-the-art visual-language pretrained models easily. The PiggyBack supports the full stack of visual question answering tasks, specifically data processing, model fine-tuning, and result visualisation. We integrate visual-language models, pretrained by HuggingFace, an open-source API platform of deep learning technologies; however, it cannot be runnable without programming skills or deep learning understanding. Hence, our PiggyBack supports an easy-to-use browser-based user interface with several deep learning visual language pretrained models for general users and domain experts. The PiggyBack includes the following benefits: Free availability under the MIT License, Portability due to web-based and thus runs on almost any platform, A comprehensive data creation and processing technique, and ease of use on deep learning-based visual language pretrained models. The demo video is available on YouTube and can be found at https://youtu.be/iz44RZ1lF4s.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes