PyTorchPipe: a framework for rapid prototyping of pipelines combining language and vision
This provides a software tool to lower the entry barrier for researchers and developers working on multi-modal machine learning tasks, though it is incremental as it builds on existing frameworks like PyTorch.
The paper introduces PyTorchPipe (PTP), a framework built on PyTorch to facilitate building and training complex, multi-modal models combining language and vision, using a component-oriented pipeline approach defined via YAML configuration files.
Access to vast amounts of data along with affordable computational power stimulated the reincarnation of neural networks. The progress could not be achieved without adequate software tools, lowering the entry bar for the next generations of researchers and developers. The paper introduces PyTorchPipe (PTP), a framework built on top of PyTorch. Answering the recent needs and trends in machine learning, PTP facilitates building and training of complex, multi-modal models combining language and vision (but is not limited to those two modalities). At its core, PTP employs a component-oriented approach and relies on the concept of a pipeline, defined as a directed acyclic graph of loosely coupled components. A user defines a pipeline using yaml-based (thus human-readable) configuration files, whereas PTP provides generic workers for their loading, training, and testing using all the computational power (CPUs and GPUs) that is available to the user. The paper covers the main concepts of PyTorchPipe, discusses its key features and briefly presents the currently implemented tasks, models and components.