Neural Variational Learning for Grounded Language Acquisition
This work addresses the challenge of acquiring grounded language for AI systems without relying on pre-specified visual categories, though it appears incremental as it builds on existing generative and embedding methods.
The authors tackled the problem of grounding language in visual percepts without predefined categories, proposing a unified generative method that learns a shared semantic/visual embedding, and showed promising results in language grounding under low-resource settings with generalizability to multilingual datasets.
We propose a learning system in which language is grounded in visual percepts without specific pre-defined categories of terms. We present a unified generative method to acquire a shared semantic/visual embedding that enables the learning of language about a wide range of real-world objects. We evaluate the efficacy of this learning by predicting the semantics of objects and comparing the performance with neural and non-neural inputs. We show that this generative approach exhibits promising results in language grounding without pre-specifying visual categories under low resource settings. Our experiments demonstrate that this approach is generalizable to multilingual, highly varied datasets.